Associate Data Scientist, NLP/LLM Infrastructure

Date: Jul 7, 2025

Location: Wroclaw, PL

Company: Dolby Laboratories, Inc.

Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We’re big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.

 

We are seeking a Data Engineer specializing in linguistic data infrastructure to join our team. With guidance from senior team members, this role will focus on implementing and maintaining data pipelines and storage solutions that support our Natural Language Processing (NLP) and Large Language Model (LLM) initiatives. The ideal candidate will combine data engineering fundamentals with an interest in linguistic data structures to help build scalable systems that effectively process, store, and deliver text data for AI applications. This role offers significant growth opportunities to develop expertise in linguistic data engineering and AI infrastructure.

 

Responsibilities

Under general guidance:

  • Linguistic Data Pipeline Development: Assist in designing, implementing, and maintaining ETL pipelines for harvesting, cleaning, and processing large text corpora from various sources
  • Text Data Infrastructure: Help build and optimize database schemas and storage solutions specifically designed for linguistic data, with a focus on efficient querying and retrieval of text patterns.
  • Database Management: Contribute to building and maintaining specialized text corpora for training domain-specific language models, with focus on terminology extraction and style pattern identification in aggregated databases.
  • Data Quality & Governance: Implement data validation processes to ensure linguistic data quality, consistency, and compliance with relevant standards and requirements.
    • Cross-functional Collaboration: Work with stakeholders to understand requirements and help translate them into data engineering solutions that support Language Engineering and AI R&D initiatives.

 

Requirements:

  • Bachelor's degree in Linguistics, Computer Science, Software Engineering, Data Engineering, or related technical field
  • Basic understanding of NLP concepts and text processing techniques
  • Basic to intermediate experience with Python programming for data engineering work
  • Exposure to prompt engineering, prompt tuning, and interacting with LLMs
  • Familiarity with SQL and database design principles
  • Knowledge of data modeling concepts for structured and unstructured data
  • Ability to follow plans and to collaborate with project leads to meet goals

 

Nice to Have: 

  • Experience with PyTorch and neo4j
  • Coursework or projects building knowledge graphs
  • Knowledge of linguistics or computational linguistics
  • Experience working with large text corpora
  • Experience with text vectorization and embedding techniques

 

Dolby Hiring Entity:

Dolby Poland Sp. z o.o.
Business Garden, Building G
ul. Legnicka 48
Wrocław, 54-202, Poland