Internship in the Field of Vision-Language-Action Models for Manipulation Robotics

Posted on 9/10/2025

Robert Bosch Venture Capital

Robert Bosch Venture Capital

No salary listed

Stuttgart, Germany

In Person

Job Description

Vision-Language-Action (VLA) models are promising candidates for general robot policies that can be widely deployed in human environments. These models map task descriptions (via language and image observations) to corresponding robot motion that fulfills the task, often focusing on object manipulation. While software pipelines are readily available for VLA training with behavioral cloning, it is still necessary to collect one's own data for fine-tuning VLAs for successful task execution. However, high-quality robot data is expensive and time-consuming to collect. Numerous recent publications in the literature propose smaller models that tackle the data issue better.

During this internship, you will focus on the following tasks:

  • You will collect data via teleoperation on bimanual robots.
  • After that, you will carry out data management and visualization.
  • Furthermore, you will adapt novel VLA models and train them with the collected data.
  • Finally, you will analyze VLA performance and scaling potential.

Qualifications

  • Education: Master studies with good grades
  • Experience and Knowledge: good coding skills with Python and Copilot; knowledge of ML libraries like PyTorch and JAX; experience with robotics and related software, like ROS; experience with developing and managing Python packages (Git, PRs, issues, etc.); ability to read and understand research papers published at major conferences (CoRL, NeurIPS, ICML, ICLR, RSS, ICRA, IROS)
  • Personality and Working Practice: you are a teamplayer who is also able to work alone
  • Work Routine: on-site in Renningen and mobile working
  • Languages: very good in English