Internship in the Field of Vision-Language-Action Models for Manipulation Robotics
Posted on 9/10/2025

Robert Bosch Venture Capital
No salary listed
Stuttgart, Germany
In Person
Job Description
Vision-Language-Action (VLA) models are promising candidates for general robot policies that can be widely deployed in human environments. These models map task descriptions (via language and image observations) to corresponding robot motion that fulfills the task, often focusing on object manipulation. While software pipelines are readily available for VLA training with behavioral cloning, it is still necessary to collect one's own data for fine-tuning VLAs for successful task execution. However, high-quality robot data is expensive and time-consuming to collect. Numerous recent publications in the literature propose smaller models that tackle the data issue better.
During this internship, you will focus on the following tasks:
- You will collect data via teleoperation on bimanual robots.
- After that, you will carry out data management and visualization.
- Furthermore, you will adapt novel VLA models and train them with the collected data.
- Finally, you will analyze VLA performance and scaling potential.
Qualifications
- Education: Master studies with good grades
- Experience and Knowledge: good coding skills with Python and Copilot; knowledge of ML libraries like PyTorch and JAX; experience with robotics and related software, like ROS; experience with developing and managing Python packages (Git, PRs, issues, etc.); ability to read and understand research papers published at major conferences (CoRL, NeurIPS, ICML, ICLR, RSS, ICRA, IROS)
- Personality and Working Practice: you are a teamplayer who is also able to work alone
- Work Routine: on-site in Renningen and mobile working
- Languages: very good in English

Internship Search Guides
How to Find an InternshipInternship SalariesInternship DeadlinesMock Interview Prep