cv | Haidar Khan

Basics

Name	Haidar Khan
Label	Research Scientist
Email	haidark@gmail.com
Phone	+1 (845) 633-0903
Url	https://haidark.github.io
Summary	A researcher trying to understand how machines can learn and how that can help humans solve challenging problems.

Work

2023.08 - 2024.10
Research Scientist

National Center for AI, Saudi Data and AI Authority

Building a national foundation model (ALLaM) for the Kingdom of Saudi Arabia serving the government and private sectors with capabilities in Arabic and English.
- Created a training stack capable of training models at 49% MFU based on MegatronLM.
- Pretrained and aligned our models on trillions of tokens of English and Arabic data.
- ALLaM outperformed all open and closed models on automated and human evaluations in Arabic
2021.10 - 2023.08
Senior Applied Scientist

Amazon Alexa

Large scale training of large (1B - 100B parameter) language models on web-scale datasets as part of the Alexa Teacher Model (AlexaTM) program.
- Developed infrastructure to scale training of large language models using DeepSpeed.
- Compressed large language models for natural language understanding, automatic speech recognition rescoring, and semantic parsing.
- Combined AlexaTM with visual understanding and image generation models to create new multimodal Alexa experiences.
2019.08 - 2021.10
Applied Scientist

Amazon Alexa

Natural language understanding (NLU) research for virtual assistants including language modeling, semantic parsing, and intent/entity recognition.
- Deployed efficient transformer-based models for Alexa NLU that satisfy production latency constraints (<10ms inference).
- Lead a team of 4 scientists and engineers to speed up sequence-to-sequence semantic parsing systems by 3x with parallel decoding.
2018.05 - 2018.08
Research Intern

Siemens Corporate Technology

Modeling agents and adversaries in a power plant network with reinforcement learning.
- Increased the possible number of modeled agents by a factor of 2 with available hardware.
2016.05 - 2017.08
Research Intern

IBM T.J. Watson Research Center

Empirically studied the minibatch size/convergence rate tradeoff for deep neural network training.
- Designed a variant of parallel SGD and analyzed performance on benchmark datasets and networks
- The algorithm implemented on an IBM HPC cluster reduced total training time from 14 to 4 days.

Education

2014.09 - 2016.05

Troy, New York
MS

Rensselaer Polytechnic Institute

Computer Science
2014.09 - 2019.05

Troy, New York
PhD

Rensselaer Polytechnic Institute

Computer Science
2009.09 - 2013.05

New Paltz, New York
BS

SUNY New Paltz

Computer Engineering

Languages

	English
	Native speaker

	Urdu
	Fluent

	Pashto
	Fluent

	Arabic
	Fluent

Basics

Work

National Center for AI, Saudi Data and AI Authority

Building a national foundation model (ALLaM) for the Kingdom of Saudi Arabia serving the government and private sectors with capabilities in Arabic and English.

Amazon Alexa

Large scale training of large (1B - 100B parameter) language models on web-scale datasets as part of the Alexa Teacher Model (AlexaTM) program.

Amazon Alexa

Natural language understanding (NLU) research for virtual assistants including language modeling, semantic parsing, and intent/entity recognition.

Siemens Corporate Technology

Modeling agents and adversaries in a power plant network with reinforcement learning.

IBM T.J. Watson Research Center

Empirically studied the minibatch size/convergence rate tradeoff for deep neural network training.

Education

Rensselaer Polytechnic Institute

Computer Science

Rensselaer Polytechnic Institute

Computer Science

SUNY New Paltz

Computer Engineering

Languages