Multimodal Data Intern (PhD, Fall 2025)
Date: Jun 17, 2025
Location: San Francisco, US
Company: Dolby Laboratories, Inc.
Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We’re big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.
The Advanced Technology Group (ATG) is the research division of the company. ATG’s mission is to look ahead, deliver insights, and innovate technological solutions that will fuel Dolby’s continued growth. Our researchers have a broad range of expertise related to computer science and electrical engineering, such as AI/ML, algorithms, digital signal processing, audio engineering, image processing, computer vision, data science & analytics, distributed systems, cloud, edge & mobile computing, computer networking, and IoT.
Summary:
Join our team to shape the future of automotive experiences by leveraging cutting-edge technologies and diverse data sources! As a Data Analyst Intern, you’ll play a crucial role in analyzing multi-modal data from cars to enhance user experiences. You’ll work with various data sources, including audio, video, sensors, and natural language interactions. Your insights will help shape the future of in-car audio experiences.
Responsibilities:
- Multi-Modal Data Collection and Analysis:
-
- Collaborate with a car data capture system to extract and integrate data from different automotive modalities (audio, video, GPS, OBD2).
- Develop preprocessing pipelines to handle synchronized multi-modal data.
- Machine Learning Model Development:
-
- Utilize machine learning techniques to create models for various tasks:
-
-
- Audio Processing Models:
- Extract relevant features from audio data.
- Train regression and classification models for audio processing, noise detection.
- Computer Vision Models:
- Implement computer vision models for cockpit analysis, passenger occupancy, driver attention.
- Perform semantic segmentation on video frames.
- Sensor Fusion and Tracking:
- Combine lidar, radar, and GPS data.
- Track objects in real time.
- Natural Language Processing (NLP):
- Process language (text and/or waveform) for in-car interactions.
- Audio Processing Models:
-
- Model Evaluation and Optimization:
-
- Evaluate the performance of multi-modal models using appropriate metrics.
- Optimize models based on user feedback and real-world scenarios.
Requirements:
- Education:
-
- Pursuing a PhD degree in Computer Science, Audiovisual, Electric Engineering, Design, Mathematics, Physics, or related fields.
- Technical Skills:
-
- Experience with command-line interface
- Experience with machine learning
- Familiarity with cloud systems
- Passion for multi-modal data analysis
- Experience with hardware prototyping
We recognize that applying to internships can be a daunting process, especially if you have not had any previous internships, or if this is a new field of interest to you. Even if you do not have previous professional experience to meet the listed qualifications, we will consider relevant project, volunteer, and extracurricular experiences, and we encourage you to include them on your resume and apply.
Eligibility
Working towards a PhD in Computer Science, Audiovisual, Electric Engineering, Design, Mathematics, Physics, or related fields; recent grads who are within 6 months of graduation are also eligible to apply. Must be available to work full-time Monday – Friday for 3 months between September 2025 – December 2025.
Start date for the internship is as follows: (*note* this date is not flexible)
- Monday, September 22, 2025
The San Francisco/Bay Area base salary range for this full-time position is $57/hr, which can vary if outside this location, plus bonus, benefits, and some roles may also include equity. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, competencies, experience, market demands, internal parity, and relevant education or training. Your recruiter can share more about the specific salary range and perks and benefits for your location during the hiring process.
Dolby will consider qualified applicants with criminal histories in a manner consistent with the requirements of San Francisco Police Code, Article 49, and Administrative Code, Article 12
Equal Employment Opportunity:
Dolby is proud to be an equal opportunity employer. Our success depends on the combined skills and talents of all our employees. We are committed to making employment decisions without regard to race, religious creed, color, age, sex, sexual orientation, gender identity, national origin, religion, marital status, family status, medical condition, disability, military service, pregnancy, childbirth and related medical conditions or any other classification protected by federal, state, and local laws and ordinances.
Nearest Major Market: San Francisco
Nearest Secondary Market: Oakland