Internship
Machine Learning Kernel Performance Engineer, Dojo
Confirmed live in the last 24 hours

Tesla
No salary listed
Palo Alto, CA, USA
In Person
Internship requires a minimum of 12 weeks, full-time and on-site.
Job Description
Consider before submitting an application:
This position is expected to start around August or September 2025 and continue through the Fall term (ending approximately December 2025) or continuing into Winter/Spring 2026 if available and there is an opportunity to do so. We ask for a minimum of 12 weeks, full-time and on-site, for most internships. Our internship program is for students who are actively enrolled in an academic program. Recent graduates seeking employment after graduation and not returning to school should apply for full-time positions, not internships.
International Students: If your work authorization is through CPT, please consult your school on your ability to work 40 hours per week before applying. You must be able to work 40 hours per week on-site. Many students will be limited to part-time during the academic year.
As a member of the Dojo Machine Learning software team, you will be responsible for developing and optimizing machine learning kernels to run on a massively parallel machine. The ideal candidate will have a background in kernel development and performance optimization for AI workloads, with a passion for delivering high-performance implementations, which are simple to use.
Job Responsibilities
Develop and validate datapath kernels on a massively parallel machine
Design and deliver implementations of known neural network algorithms on Dojo Hardware, for both training and inference workloads
Collaborate with architects and compiler engineers to transfer kernel designs to the Dojo compiler and production environment
Work closely with the team to drive improvements to the kernel development framework, supporting environments, and other components of the system
Work with hardware and simulations teams to ensure future hardware generations optimally satisfy algorithmic requirements
Participate in code reviews, testing, and debugging to ensure high-quality software
Stay up-to-date with the latest developments in AI workloads, domain-specific languages, computer architecture, and simulation techniques
Job Requirements
Pursing a Masters Degree or higher in Engineering, Computer Science, Mathematics, AI or similar, with a graduation date between December 2025 – May 2026
Experience in kernel development, computer architecture, and AI workloads
Understanding of Large Language Models (LLMs), transformer architectures, their training, inference, and optimization
Strong programming skills in languages such as C/C++ and Python
Experience with deep learning frameworks such as PyTorch, JAX, Pallas etc
Strong understanding of CPU and/or GPU microarchitecture, including pipelining, caching, and memory hierarchy
Experience writing, analyzing and optimizing assembly kernels
Strong communication and collaboration skills, with the ability to work effectively with architects, engineers, and researchers
Performance analysis experience, including with performance simulation frameworks preferred
Familiarity with parallel programming models such as Cuda and OpenMPI preferred
Experience with Continuous Integration and testing frameworks such as Bazel or Pytest preferred

Internship Search Guides
How to Find an InternshipInternship SalariesInternship DeadlinesMock Interview Prep