Job Description

Consider before submitting an application:

This position is expected to start around August or September 2025 and continue through the Fall term (ending approximately December 2025) or continuing into Winter/Spring 2026 if available and there is an opportunity to do so. We ask for a minimum of 12 weeks, full-time and on-site, for most internships. Our internship program is for students who are actively enrolled in an academic program. Recent graduates seeking employment after graduation and not returning to school should apply for full-time positions, not internships.

International Students: If your work authorization is through CPT, please consult your school on your ability to work 40 hours per week before applying. You must be able to work 40 hours per week on-site. Many students will be limited to part-time during the academic year.

As a member of the Dojo Machine Learning software team, you will be responsible for developing and optimizing machine learning kernels to run on a massively parallel machine. The ideal candidate will have a background in kernel development and performance optimization for AI workloads, with a passion for delivering high-performance implementations, which are simple to use.

Job Responsibilities

Develop and validate datapath kernels on a massively parallel machine
Design and deliver implementations of known neural network algorithms on Dojo Hardware, for both training and inference workloads
Collaborate with architects and compiler engineers to transfer kernel designs to the Dojo compiler and production environment
Work closely with the team to drive improvements to the kernel development framework, supporting environments, and other components of the system
Work with hardware and simulations teams to ensure future hardware generations optimally satisfy algorithmic requirements
Participate in code reviews, testing, and debugging to ensure high-quality software
Stay up-to-date with the latest developments in AI workloads, domain-specific languages, computer architecture, and simulation techniques

Job Requirements

Pursing a Masters Degree or higher in Engineering, Computer Science, Mathematics, AI or similar, with a graduation date between December 2025 – May 2026
Experience in kernel development, computer architecture, and AI workloads
Understanding of Large Language Models (LLMs), transformer architectures, their training, inference, and optimization
Strong programming skills in languages such as C/C++ and Python
Experience with deep learning frameworks such as PyTorch, JAX, Pallas etc
Strong understanding of CPU and/or GPU microarchitecture, including pipelining, caching, and memory hierarchy
Experience writing, analyzing and optimizing assembly kernels
Strong communication and collaboration skills, with the ability to work effectively with architects, engineers, and researchers

Performance analysis experience, including with performance simulation frameworks preferred
Familiarity with parallel programming models such as Cuda and OpenMPI preferred
Experience with Continuous Integration and testing frameworks such as Bazel or Pytest preferred

Internship

Machine Learning Kernel Performance Engineer, Dojo

Tesla

Job Description

Job Responsibilities

Job Requirements