cv

Basics

Name Haidar Khan
Label Research Scientist
Email haidark@gmail.com
Phone +1 (845) 633-0903
Url https://haidark.github.io
Summary A researcher trying to understand how machines can learn and how that can help humans solve challenging problems.

Work

  • 2023.08 - 2024.10
    Research Scientist
    National Center for AI, Saudi Data and AI Authority
    Building a national foundation model (ALLaM) for the Kingdom of Saudi Arabia serving the government and private sectors with capabilities in Arabic and English.
    • Created a training stack capable of training models at 49% MFU based on MegatronLM.
    • Pretrained and aligned our models on trillions of tokens of English and Arabic data.
    • ALLaM outperformed all open and closed models on automated and human evaluations in Arabic
  • 2021.10 - 2023.08
    Senior Applied Scientist
    Amazon Alexa
    Large scale training of large (1B - 100B parameter) language models on web-scale datasets as part of the Alexa Teacher Model (AlexaTM) program.
    • Developed infrastructure to scale training of large language models using DeepSpeed.
    • Compressed large language models for natural language understanding, automatic speech recognition rescoring, and semantic parsing.
    • Combined AlexaTM with visual understanding and image generation models to create new multimodal Alexa experiences.
  • 2019.08 - 2021.10
    Applied Scientist
    Amazon Alexa
    Natural language understanding (NLU) research for virtual assistants including language modeling, semantic parsing, and intent/entity recognition.
    • Deployed efficient transformer-based models for Alexa NLU that satisfy production latency constraints (<10ms inference).
    • Lead a team of 4 scientists and engineers to speed up sequence-to-sequence semantic parsing systems by 3x with parallel decoding.
  • 2018.05 - 2018.08
    Research Intern
    Siemens Corporate Technology
    Modeling agents and adversaries in a power plant network with reinforcement learning.
    • Increased the possible number of modeled agents by a factor of 2 with available hardware.
  • 2016.05 - 2017.08
    Research Intern
    IBM T.J. Watson Research Center
    Empirically studied the minibatch size/convergence rate tradeoff for deep neural network training.
    • Designed a variant of parallel SGD and analyzed performance on benchmark datasets and networks
    • The algorithm implemented on an IBM HPC cluster reduced total training time from 14 to 4 days.

Education

  • 2014.09 - 2016.05

    Troy, New York

    MS
    Rensselaer Polytechnic Institute
    Computer Science
  • 2014.09 - 2019.05

    Troy, New York

    PhD
    Rensselaer Polytechnic Institute
    Computer Science
  • 2009.09 - 2013.05

    New Paltz, New York

    BS
    SUNY New Paltz
    Computer Engineering

Languages

English
Native speaker
Urdu
Fluent
Pashto
Fluent
Arabic
Fluent