SpotifyNew York, NY

Data Engineer II - Gen AI - Music

Posted 2 months ago

Description

We are seeking Data Engineers to join our Artist-First AI Music lab. Our team designs and builds state-of-the-art generative products for music that create breakthrough experiences for fans and artists. We invent entirely new listening experiences that center and celebrate artists and creatives. All of our products will put artists and songwriters first, through these four principles:

Partnerships with record labels, distributors, and music publishers: We’ll develop new products for artists and fans through upfront agreements, not by asking for forgiveness later.

Choice in participation: We recognize there’s a wide range of views on use of generative music tools within the artistic community. Therefore, artists and rightsholders will choose if and how to participate to ensure the use of AI tools aligns with the values of the people behind the music.

Fair compensation and new revenue: We will build products that create wholly new revenue streams for rightsholders, artists, and songwriters, ensuring they are properly compensated for uses of their work and transparently credited for their contributions.

Artist-fan connection: AI tools we develop will not replace human artistry. They will give artists new ways to be creative and connect with fans. We will leverage our role as the place where more than 700 million people already come to listen to music every month to ensure that generative AI deepens artist-fan connections.

What You'll Do

Build and maintain large-scale data pipelines, including ML pipelines, with data processing frameworks like Scio and Python-based tools on Google Cloud Platform. Leverage data engineering best practices in continuous integration and delivery. Help drive optimization, testing and tooling to improve data quality and reliability. Collaborate with engineers, product managers, subject matter experts, and stakeholders while taking on learning and leadership opportunities that arise every day. Work in cross-functional, agile teams to continuously experiment, iterate, and deliver on new product objectives.

Who You Are

You have at least 3+ years of professional experience working in a product-driven environment. You have experience working with high-volume, heterogeneous data using distributed systems and big data technologies such as Python, Scala (e.g., Scio), Ray, Apache Spark, or similar frameworks used for distributed data processing. You are proficient in deg and building distributed data pipelines in Python, Scala, or Java, with experience in frameworks like Scio on platforms such as Dataflow. You understand data modeling, data access, and data storage techniques, and can apply them to both batch and analytical processing (e.g., using BigQuery for analysis). You value iterative software processes, data-driven development, reliability, and responsible experimentation, with attention to cost efficiency and best practices in data engineering. You thrive in collaborative environments and enjoy working with cross-functional teams. You are a creative problem solver who is passionate about building outstanding products that add real value to millions of people. You are enthusiastic about learning more about turning research ideas into products operating at scale

Where You'll Be

We offer you the flexibility to work where you work best! For this role, you can be within the Eastern United States region as long as we have a work location. This team operates within the EST time zone for collaboration.

Skills

BigqueryApacheScalaData EngineeringMachine LearningAgileGCPJavaPythonApache Spark