Table of Contents

Project Statement and Goals
Motivation and Background
Data Description
EDA
Data Cleaning
Metrics
Model Training
Interpreting the Model
Model Testing and Results
Conclusion and What’s Next
Literature Review

Motivation:

Recommendation systems have become an integral part of many of our lives— perhaps more than we realize. From reading, shopping, and web browsing to traveling accomodations and dating, the internet has brought a bewildering gamut of choices. With a myriad of options, making optimal decisions can be stressful; however, it’s not hard to recount instances where an algorithm has heped us save time or even make a pivotal decision. As more and more content goes online, and information and media become more convenient to access, recommendation systems will become more central than they already are.

In particular, music streaming services are growing exponentially largely due to recommendation systems that can guide users in leveraging the massive content libraries. Yet, one of the reasons why music recommendation systems are a hot area of research is because they are still far from perfect. In the near future, the quantity of data already being used by streaming services to make suggestions to users will be an order of magnitude greater. Furthermore, new dimensions of highly relevant data related to user’s emotions and context will also soon come online.

Background:

In 2006-2009 Netflix hosted a competition with a $1,000,000 purse called ‘The Netflix Prize’ that stimulated much academic interest in Machine Learning and RS in particular. Other companies have been inspired since to host their own competitions; Spotify is one of the latest such examples. Spotify is a music streaming service which has over 40 million songs and 170 million users and growing. Songs can be organized into playlists by editorial staff, Spotify users, algorithms or a combination thereof. Currently, the service has more than 2 Billion playlists, a credit to Spotify’s work in making playlist continuation super addictive.

The company is striving to further improve its system to predict songs which can be good candidates for playlists already created. In order to achieve this goal, the company created a challenge in which it released the Million Playlist Dataset (MPD), a Dataset containing 1,000,000 playlists created by users on Spotify. The aim of the participants in the challenge was to develop a system for the task of automatic playlist continuation. For details, see: http://recsys-challenge.spotify.com.

This project seeks to take advantage of the trove of playlist data released by Spotify and take an active role in the excitement in the air around recommendation systems.