The Shuffler: using data clustering for a redesigned smart shuffle algorithm

3 min readOct 17, 2023

[This is deep dive for a contribution on the Pensieve Project: https://pensieveproject.com/ ]

Problem Statement

Collecting your favorite songs in a single playlist is a very personal experience that many of us had the pleasure of enjoying over the years. As we grow and change, so do our playlists. Over time you expand your musical taste and knowledge, and you are now the proud owner of a set that would make any DJ jealous. What started as a simple set of a dozen tracks meant to get through your workouts now turned into a soundtrack for your life.

These songs while they hold a significance to you individually were never really meant to be played in just any random order. You shuffle through your favorite playlist on a regular basis, but now you find yourself skipping through many songs because you’ve already heard them 10 times this week (shuffle is too repetitive). Sometimes, the music player finds the right tracks to get you started and all of a sudden throws a complete curve ball by completely shifting the genre or tempo (shuffle lacks context).

What if your shuffle functionality on your favorite music player was just a little smarter? Just enough the keep the vibes immaculate consistently?

Technical Concept to explore

Data clustering, also known as data segmentation, is a data mining technique that groups similar data points into clusters. The goal is to divide a dataset into groups such that the data points within each group are more similar to each other than to data points in other groups

Project Description

The Shuffler is a re-imagined smart shuffle functionality for music players. It groups songs in a given playlist into subsets based on track metadata (i.e: genre, beat, or tempo) as well as additional factors (i.e: artist or language). This will allow us to identify connections between tracks in order to maintain the “vibe”

Primary Application

Rebuild Spotify’s shuffle algorithm to account for songs metadata such as genre, language, artist, tempo, etc..

Future Iterations

use machine learning models and suggestive algorithms to generate custom playlists
Explore the world of music therapy by creating mood shifter playlists

MVP Approach

Build a microservice that connects to Spotify’s APIs to enable key functionalities such as Login, Playlist retrieval, Playlist creation, Track retrieval, Search, and Metadata analysis
Build a new algorithm to find affinity in a data set and separate samples in small groups with a transition key to link them. Turn a large dataset into connected nodes by identifying data profiles and group items based on profile ressemblance
Build a react Web App for the user experience

Key Functionality

OAuth Login: Ability to login using spotify’s API || Leverage data for user profile info
User Profile: Page to display user info || Could be used later to customize user experience
User Dashboard: Display a summary of functionality available to user
Playlist Page: List of playlists available (created, followed, contributing to, etc…)
Shuffle: shuffle…..but better

MVP User Flow