cv
Basics
Name | Haidar Khan |
Label | Research Scientist |
haidark@gmail.com | |
Phone | +1 (845) 633-0903 |
Url | https://haidark.github.io |
Summary | A researcher trying to understand how machines can learn and how that can help humans solve challenging problems. |
Work
-
2023.08 - 2024.10 Research Scientist
National Center for AI, Saudi Data and AI Authority
Building a national foundation model (ALLaM) for the Kingdom of Saudi Arabia serving the government and private sectors with capabilities in Arabic and English.
- Created a training stack capable of training models at 49% MFU based on MegatronLM.
- Pretrained and aligned our models on trillions of tokens of English and Arabic data.
- ALLaM outperformed all open and closed models on automated and human evaluations in Arabic
-
2021.10 - 2023.08 Senior Applied Scientist
Amazon Alexa
Large scale training of large (1B - 100B parameter) language models on web-scale datasets as part of the Alexa Teacher Model (AlexaTM) program.
- Developed infrastructure to scale training of large language models using DeepSpeed.
- Compressed large language models for natural language understanding, automatic speech recognition rescoring, and semantic parsing.
- Combined AlexaTM with visual understanding and image generation models to create new multimodal Alexa experiences.
-
2019.08 - 2021.10 Applied Scientist
Amazon Alexa
Natural language understanding (NLU) research for virtual assistants including language modeling, semantic parsing, and intent/entity recognition.
- Deployed efficient transformer-based models for Alexa NLU that satisfy production latency constraints (<10ms inference).
- Lead a team of 4 scientists and engineers to speed up sequence-to-sequence semantic parsing systems by 3x with parallel decoding.
-
2018.05 - 2018.08 Research Intern
Siemens Corporate Technology
Modeling agents and adversaries in a power plant network with reinforcement learning.
- Increased the possible number of modeled agents by a factor of 2 with available hardware.
-
2016.05 - 2017.08 Research Intern
IBM T.J. Watson Research Center
Empirically studied the minibatch size/convergence rate tradeoff for deep neural network training.
- Designed a variant of parallel SGD and analyzed performance on benchmark datasets and networks
- The algorithm implemented on an IBM HPC cluster reduced total training time from 14 to 4 days.
Education
Languages
English | |
Native speaker |
Urdu | |
Fluent |
Pashto | |
Fluent |
Arabic | |
Fluent |