Good afternoon! Here's your data engineer interview problem for today. This question was asked by Spotify.
Your task is to design a data pipeline that analyzes user listening habits and generates personalized playlist recommendations.
Data Sources:
user_activity
: Contains user listening data.song_metadata
: Contains details about songs.user_preferences
: Contains each user's preferred genre.
Task:
Write SQL queries to:
Calculate total listening time per user.
Identify the most listened song for each user.
Recommend three songs from the user's preferred genre that they haven't listened to, ordered by release year (newest first).
Use the provided sample data to test your queries.
Sample Data
It's important to ensure that your SQL environment is set up with the tables and data structured as specified.
SQL Script for Table Creation and Data Insertion:
Access the SQL script here.
Solution & Explanation
Keep reading with a 7-day free trial
Subscribe to Cracking the Data Engineering Interview to keep reading this post and get 7 days of free access to the full post archives.