Splitify: Building a Playlist Clustering Tool
Splitify breaks one large playlist into several smaller playlists based on track audio features. I built it because my playlists had turned into broad song dumps that worked poorly on shuffle. Songs I liked were still there, but the listening experience was inconsistent enough that I skipped tracks more than I enjoyed them.
The goal was to keep the songs I liked while reorganizing them into groups that actually felt cohesive.
Why Audio Features
I chose not to split playlists only by artist or genre because those labels are still too broad in practice:
- An artist can sound very different across albums and eras.
- Genres hide a lot of variance once playlists get large.
- Two songs can share an artist or genre label while feeling completely different in energy, mood, and pacing.
Instead, I clustered tracks using audio features:
energydanceabilityvalencetempoacousticnessinstrumentalnessspeechinesslivenessloudness
This also gave users room to tune the output. If the default weighting did not match their taste, they could adjust the feature weights and regenerate the split.
Clustering Design
The core clustering algorithm is a Gaussian Mixture Model (GMM).
I used GMM because music similarity is not perfectly separable, and I wanted a model that could represent softer boundaries between groups. A rigid partitioning algorithm produced clusters that were easier to compute, but less aligned with how playlists actually feel.
High-level flow:
- Standardize audio feature rows.
- Apply user-adjusted feature weights after scaling.
- Evaluate multiple candidate cluster counts.
- Score each candidate using a weighted objective that balances compactness, tail cohesion, inter-cluster uniqueness, BIC, and cluster balance.
- Post-process the winning result by merging tiny clusters and optionally splitting overly broad ones when cohesion improves.
I also capped output complexity with limits like maximum cluster count and minimum cluster size so the generated playlists stayed usable.
API Constraints and Fallbacks
The hardest engineering problem was external dependency reliability.
When I started the project, Spotify’s API was more practical for this workflow. Later, access to key endpoints became more restricted and effectively tied to Extended Quota approval. That created a circular dependency: I needed a working demo to justify broader access, but broader access was needed to deliver the full demo behavior.
To keep the project moving, I built a fallback path around ReccoBeats for audio features.
That pipeline worked like this:
- Fetch ReccoBeats features in batches.
- For unresolved tracks, fetch Spotify metadata.
- Search Spotify for likely alternate track matches.
- Re-query ReccoBeats for those candidate IDs.
- If a candidate has usable features, map them back to the original missing track.
That approach recovered a meaningful number of misses without hiding unresolved coverage gaps from the user.
Full-Stack Implementation
The frontend is a React app with two primary flows:
/loginstarts OAuth./input-playlisthandles playlist selection, split controls, and processing progress.
Users can choose preset split criteria or open advanced controls to tune feature weights manually. Once processing begins, the client requests a job ID, polls job status, and displays progress, completion state, and the last finished playlist.
On the backend, I handled:
- OAuth and session management
- playlist retrieval
- audio feature resolution
- clustering
- new playlist creation and population
- async job status tracking
To keep larger jobs stable, I added:
- an in-process L1 cache for feature lookups
- a Postgres-backed L2 cache for shared feature and miss records
- bounded concurrency for external calls and playlist creation
- retry and backoff behavior for transient failures
429handling withRetry-Aftersupport
Those pieces mattered as much as the model itself. Without caching, concurrency control, and fallback handling, the product felt unreliable even when the clustering logic was correct.
Security and Session Handling
Security controls included:
- OAuth
stateverification for CSRF protection - explicit scope tracking
- an HttpOnly session cookie
- configurable
SecureandSameSitecookie settings - restricted CORS configuration
- standard response security headers
- HSTS on secure requests
I also added token-expiration handling so expired sessions triggered a clean re-auth path instead of ambiguous client failures.
Deployment
The current repo deploys frontend changes to Firebase Hosting on pushes to main. Backend linting is automated, while backend runtime deployment is handled through Google Cloud on merges to main.
Repository: github.com/stanleygvi/splitify