SpotifyToronto

Product Manager - AI Observability/Evaluation Platform

Posted 1 week ago

Description

We are looking for a Product Manager to join the ML/AI Platform team at Spotify.

The team helps various groups build, deliver, and run ML and AI-enabled experiences at scale. We want a Product Manager to oversee AI Observability and Evaluation. These platform tools assist teams in tracking LLM behavior during production, quickly resolving issues, measuring performance, and consistently refining our AI products.

This is an internal platform role with highly technical customers (ML engineers, backend engineers, data scientists). You will partner closely with engineering leaders to define the paved road for instrumenting, monitoring, and operating AI workloads end-to-end.

What You'll Do

Own the product roadmap for Spotify’s AI observability capabilities across instrumentation, data contracts, and debugging workflows. Collaborate with stakeholders to develop the roadmap for Spotify’s AI/Agentic evaluation capabilities via LLM-as-judges and other approaches. Build golden path instrumentation defaults (SDKs/libraries/templates) that make LLM workloads observable by default. Partner with engineering to deliver root-cause workflows across LLM chains/agents (what changed, where time/cost is spent, which step fails, which prompt/template/model version regressed). Drive adoption with internal teams through discovery, pilots, documentation, and ongoing enablement. Define and track success metrics (coverage, time-to-first-trace, regressions caught pre-prod, MTTR, cost anomalies detected early). Partner with /Security to establish safe logging and retention guidance that supports both compliance and debuggability.

Who You Are

2+ years of experience as a Product Manager, Technical Product Manager, or equivalent product ownership role. Familiar with ML and AI development flows, and comfortable reasoning about LLM application patterns (prompting, RAG/agents, evaluation basics). Aware of LLM evaluation approaches (golden datasets, regression testing, human-in-the-loop review, LLM-as-a-judge). Comfortable partnering with engineers on technical tradeoffs and system design, even if you are not writing production code. Strong product fundamentals: discovery, prioritization, roadmap communication, and stakeholder alignment. Metrics-oriented and pragmatic; you can define outcomes and iterate based on evidence.

Where You'll Be

This role is based in Toronto. We offer you the flexibility to work where you work best! There will be some in person meetings, but still allows for flexibility to work from home.

Skills

Machine LearningSecurity