Senior Generative AI Researcher
Date: Feb 6, 2025
Location: Bangalore, IN
Company: Dolby Laboratories, Inc.
Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We’re big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.
Dolby’s research division is looking for an AI researcher to join Dolby’s research efforts to develop the next generation of AI based multimodal technologies. The candidate will work with Dolby’s world-class audio and vision experts to invent new multimedia analysis, processing and rendering technologies to drive new interactive and immersive experiences. As a part of an international team, the senior Generative AI Researcher will work on ideas exploring new horizons in multimodal analysis, processing, and generation. The researcher is responsible for performing fundamental new research, transferring technology to product groups, draft patent applications, and publish papers.
Summary
Dolby’s research division is currently looking for a talented, self-motivated AI researcher to push the boundaries of the state-of-the-art in media technologies. An ideal candidate would have a strong background in deep learning, both in terms of conceptual understanding, as well as practical experience. A core aspect of this role involves staying up to date with the latest literature and driving innovation by implementing cutting-edge techniques in generative models, self-supervised learning, and multi-modal learning.. Consequently, knowledge or experience in any/all the following are helpful:
- Generative modeling for audio and video applications (diffusion models, autoregressive models, masked generative transformers, etc.).
- Multimodal semantic understanding and multimodal reasoning.
- Multimodal representations (that include audio and/or video).
- Multimodal AI architectures, with a focus on generating audio and/or video (text-to-audio, text-to-video, image-to-audio, etc.).
- AI-driven enhancement, processing, and generation (for audio and/or video applications).
- Exposure to audio and video applications (source separation, text-to-speech, music generation, image generation, image captioning, question answering, etc.)
With the explosion of large language models and natural language processing, the candidate will work closely with Dolby’s Machine Perception and Reasoning team to join a team of top-tier researchers working on challenging problems in multimodal AI for entertainment applications. You will focus on the creation and implementation of multimodal AI technologies from the underlying theoretical concepts to the development of prototypes and demonstrations, with the goal to create new experiences.
The role will involve prototyping inspiring experiences that explore a complement of modalities. These technologies will be used to extend immersion and interaction, so the candidate should be willing to explore empirical refinement of the user experience.
Main responsibilities:
- Work closely with other domain experts to refine and execute Dolby’s technical strategy in artificial intelligence and machine learning.
- Use deep learning to create new solutions and enhance existing applications.
- Push the state-of-the-art and develop intellectual property.
- Mentor interns on novel research problems.
- Publish papers in top-tier conferences and journals.
- Transfer technology to product groups and draft patent applications.
- Advise internal leaders on recent deep learning advancements in the industry and academia to further influence research direction and business decisions.
Requirements:
- Ph.D. in computer science or similar, with a focus on deep learning.
- Experience in AI applied to audio and/or video is a requirement.
- Technical knowledge and experience in generative modeling for audio (music, speech, sfx) or video.
- Strong publication record, with publications in top-tier domain specific conferences (CVPR, ICCV, ECCV, ICASSP, ISMIR), and/or machine learning conferences (e.g., NeurIPS, ICLR, ICML).
- Deep knowledge about current machine learning literature.
- Highly skilled in Python and one or more popular deep learning frameworks (TensorFlow or PyTorch)
- Ability to envision new technologies and turn them into innovative products. Creativity
- Good communication skills.
*LI-SB1