Job Description

Vision-Language-Action (VLA) models are promising candidates for general robot policies that can be widely deployed in human environments. These models map task descriptions (via language and image observations) to corresponding robot motion that fulfills the task, often focusing on object manipulation. While software pipelines are readily available for VLA training with behavioral cloning, it is still necessary to collect one's own data for fine-tuning VLAs for successful task execution. However, high-quality robot data is expensive and time-consuming to collect. Numerous recent publications in the literature propose smaller models that tackle the data issue better.

During this internship, you will focus on the following tasks:

You will collect data via teleoperation on bimanual robots.
After that, you will carry out data management and visualization.
Furthermore, you will adapt novel VLA models and train them with the collected data.
Finally, you will analyze VLA performance and scaling potential.

Qualifications

Education: Master studies with good grades
Experience and Knowledge: good coding skills with Python and Copilot; knowledge of ML libraries like PyTorch and JAX; experience with robotics and related software, like ROS; experience with developing and managing Python packages (Git, PRs, issues, etc.); ability to read and understand research papers published at major conferences (CoRL, NeurIPS, ICML, ICLR, RSS, ICRA, IROS)
Personality and Working Practice: you are a teamplayer who is also able to work alone
Work Routine: on-site in Renningen and mobile working
Languages: very good in English

Internship in the Field of Vision-Language-Action Models for Manipulation Robotics

Robert Bosch Venture Capital

Job Description

Qualifications