Sr Staff Operational Support Engineer
Date: Jun 2, 2026
Location: Atlanta, US
Company: Dolby Laboratories, Inc.
Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We’re big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.
The Dolby Cloud Solutions organization builds technologies and innovations that easily integrate into service providers’ infrastructure to make content experiences more effective, meaningful, and engaging for consumers.
Dolby OptiView is building a world-class Operational Support organization responsible for the stability, availability, and operational maturity of our 24/7 live video streaming, ads, player, and real-time delivery platforms. As a Senior Operational Support Engineer, you provide technical and operational leadership for the most complex, high-impact production scenarios. You act as the escalation point beyond L2, lead critical incident response for Tier‑1 customers and marquee live events, and drive systemic improvements across reliability, automation, and operational readiness.
This role goes beyond incident resolution. You shape how we operate at scale, influence platform design through operational feedback, and partner deeply with Engineering, DevOps, Product, and Support leadership to continuously raise the reliability bar for OptiView.
Key Responsibilities
Incident Leadership & Escalations
- Serve as the final operational escalation point for severe, complex, or prolonged customer-impacting incidents
- Lead resolution of multi-system, multi-team incidents spanning streaming pipelines, player platforms, ad insertion, DRM, CDN, and real-time services
- Own incident command during major live events, including decision-making under pressure and risk-based trade-offs
- Drive high-quality, executive- and customer-facing incident communications during critical situations
- Coach and support L2 engineers during live incidents, providing guidance and oversight without taking ownership away unnecessarily
Advanced Production Operations & IaC
- Operate confidently and independently on production environments with broad system-level awareness
- Design, review, and approve complex production changes using Infrastructure as Code as the default mechanism
- Deep expertise across:
- Terraform
- Helm & Kubernetes manifests
- GitOps workflows
- CI/CD and deployment pipelines
Partner with Engineering and DevOps to:
- Improve deployment safety and rollback strategies
- Define operational guardrails and blast-radius controls
- Influence platform architecture with operability and resilience in mind
AI-Driven Operations, Automation & Tooling
- Lead adoption of AI-augmented operations across the support organization
- Define and evolve:
- AI-assisted incident triage and prioritization
- Automated and semi-automated runbooks
- Intelligent alert correlation and noise reduction
- Use AI and automation to:
- Reduce mean time to detect (MTTD) and resolve (MTTR)
- Identify systemic patterns across incidents and customers
- Improve the quality and consistency of incident communications
- Champion an automation-first mindset, identifying opportunities where manual operational work should be eliminated entirely
Operational Readiness & Live Event Excellence
- Own operational readiness for high-risk, high-visibility customer events
- Lead pre-event planning and validation, including:
- Architecture and risk reviews
- Runbook and escalation path validation
- Monitoring, alerting, and SLO coverage assessment
- Design and rehearse incident response strategies for worst-case scenarios
- Act as a trusted operational advisor to strategic customers before, during, and after major events
On-Call & 24/7 Operations
- Participate in a 24/7 on-call rotation, including nights, weekends, and holidays, as part of a global support model
- Ensure smooth handovers between shifts and regions
- Respond to critical alerts within defined SLAs for stream health, player errors, and delivery infrastructure
Root Cause & Continuous Improvement
- Perform or contribute to root cause analysis (RCA) for production incidents
- Document findings, corrective actions, and preventive measures
- Identify recurring issues and work with Engineering and Product teams to eliminate them permanently
- Contribute to and improve runbooks, operational playbooks, and knowledge bases for all OptiView products (Player, ads, live and real time streaming)
Collaboration & Engineering Feedback Loop
- Work closely with Engineering teams to escalate defects, validate fixes, and support production deployments
- Provide feedback on system observability, tooling gaps, and operational risks
- Act as the operational voice during post-incident reviews
Required Skills & Experience
Technical Skills
- 8+ years of relevant experience in operational, support, or similar customer‑facing roles
- Proven ability to own complex problems end‑to‑end and operate with a high degree of autonomy
- Experience influencing decisions and outcomes beyond individual contribution
- Deep experience operating and supporting large-scale, production video streaming platforms
- Solid troubleshooting skills across distributed systems (APIs, microservices, cloud infrastructure)
- Expert understanding of HLS, DASH, CMAF, WebRTC, DRM and CDN architectures
- Advanced experience working with monitoring, alerting, and logs to diagnose live incidents (Grafana, Kibana/ELK, Prometheus, Loki)
- Correlate backend streaming metrics, player telemetry, and CDN signals to diagnose live customer issues end-to-end.
- Proven ability to safely execute complex production changes under pressure
Leadership & Operational Mindset
- Demonstrated leadership during high-severity, customer-impacting incidents
- Strong sense of ownership and accountability for customer outcomes
- Excellent written and verbal communication skills, including customer-facing communication during incidents
#LI-GW1
The Atlanta Area base salary range for this full-time position is $152,200 - $209,200, which can vary if outside this location, plus bonus, benefits, and some roles may also include equity. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, competencies, experience, market demands, internal parity, and relevant education or training. Your recruiter can share more about the specific salary range and perks and benefits for your location during the hiring process.
Dolby will consider qualified applicants with criminal histories in a manner consistent with the requirements of San Francisco Police Code, Article 49, and Administrative Code, Article 12
Equal Employment Opportunity:
Dolby is proud to be an equal opportunity employer. Our success depends on the combined skills and talents of all our employees. We are committed to making employment decisions without regard to race, religious creed, color, age, sex, sexual orientation, gender identity, national origin, religion, marital status, family status, medical condition, disability, military service, pregnancy, childbirth and related medical conditions or any other classification protected by federal, state, and local laws and ordinances.
Nearest Major Market: Atlanta