AML-MLsys combines system engineering and the art of machine learning to develop and maintain massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI In our team, you'll have the opportunity to build the large scale heterogeneous system integrating with GPU/NPU/RDMA/Storage and keep it running stable and reliable, enrich your expertise in coding, performance analysis and distributed system, and be involved in the decision-making process. You'll also be part of a global team with members from the United States, China and Singapore working collaboratively towards unified project direction. We are looking for talented individuals to join us for an internship in 2026. Internships at ByteDance aim to offer students industry exposure and hands-on experience. Watch your ambitions become reality as your inspiration brings infinite opportunities at ByteDance. PhD internships at ByteDance provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts. Applications will be reviewed on a rolling basis - we encourage you to apply early. Please state your availability clearly in your resume (Start date, End date). Candidates who pass resume screening will be invited to participate in ByteDance's technical online assessment. Responsibilities: - Responsible for developing and optimizing LLM training & inference & Reinforcement Learning framework. - Working closely with model researchers to scale LLM training & Reinforcement Learning to the next level. - Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM training and inference and RL engine. Minimum Qualifications: - Currently pursuing a PhD in computer science, automation, electronics engineering or a related technical discipline - Proficient in algorithms and data structures, familiar with Python - Understand the basic principles of deep learning algorithms, be familiar with the basic architecture of neural networks and understand deep learning training frameworks such as Pytorch. Preferred Qualifications: - Proficient in GPU high-performance computing optimization technology on CUDA, in-depth understanding of computer architecture, familiar with parallel computing optimization, memory access optimization, low-bit computing, etc. - Familiar with FSDP, Deepspeed, JAX SPMD, Megatron-LM, Verl, TensorRT-LLM, ORCA, VLLM, SGLang, etc. - Knowledge of LLM models, experience in accelerating LLM model optimization is preferred. By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://jobs.bytedance.com/en/legal/privacy If you have any questions, please reach out to us at [email protected]

Research Scientist Intern

ML System

ByteDance