Your Next Favorite Movie Isn’t Random-Here’s Why
The Algorithm That Knows You Better Than You Know Yourself
I used to think Netflix knew me. Then one rainy evening it suggested a documentary about deep-sea fish that I never would have searched for — and I was hooked for two hours. That's when I realized: the algorithm was smarter than my own intuition. In this blog, I want to break down exactly how these systems work, from the math behind them to the real-world engineering powering the platforms I use every day.
The Machine Behind Every "You Might Also Like"
A Recommendation System is a type of information filtering system that predicts and surfaces content, products, or services most relevant to a specific user — before they even know they want it. I think of it as a digital friend who has watched everything you've ever watched, read your ratings, and has the memory of an elephant.
These systems are everywhere in my daily life: Spotify's Discover Weekly, Amazon's product suggestions, YouTube's autoplay, and of course, Netflix. The underlying goal is always the same — model user preferences and predict future behaviour.
Collaborative Filtering
Uses the collective behaviour of all users to predict what one user might like based on similar users or items.
Content-Based Filtering
Recommends items similar to ones a user has liked by analysing item features and user preference profiles.
Hybrid Systems
Combines both collaborative and content-based methods to overcome individual limitations like cold-start.
Knowledge-Based
Leverages explicit domain knowledge and user requirements — common in real-estate or high-stakes domains.
Deep Learning
Neural networks (RNNs, Transformers, Autoencoders) learn complex non-linear user-item interaction patterns.
Context-Aware
Incorporates contextual signals like time, location, and device to personalise recommendations further.
The Power of "People Like You"
👤 User-Based Collaborative Filtering
This method recommends items based on users with similar interests.
- Find users with similar preferences
- Use similarity measures like Cosine Similarity
- Recommend items liked by similar users
- Works well when user data is rich
🔗 Item-Based Collaborative Filtering
This method recommends items similar to what the user already likes.
- Find similarity between items
- Use past user ratings to build relationships
- Recommend similar items to the user
- More stable than user-based approach
When I present this topic in my seminar, I always start here: Collaborative Filtering (CF) is the backbone of modern recommendation systems. The intuition is beautifully simple — if two users agreed on many things in the past, they'll probably agree in the future too.
👥 User-Based Collaborative Filtering
The four-step process I covered in my seminar slide (Slide 1 above) walks through the full pipeline:
📐 Similarity Metrics
Step 2 is where the maths gets interesting. As shown in my slide, we calculate how "close" two users are in preference-space using one of three methods. Cosine Similarity is the most widely used:
= Σ(Aᵢ × Bᵢ) / [ √Σ(Aᵢ²) × √Σ(Bᵢ²) ]
Pearson Correlation:
r = Σ[(xᵢ−x̄)(yᵢ−ȳ)] / [√Σ(xᵢ−x̄)² × √Σ(yᵢ−ȳ)²]
Euclidean Distance:
d(A,B) = √Σ(Aᵢ − Bᵢ)² (lower = more similar)
| User / Movie | Inception | Interstellar | Tenet | The Dark Knight | Dunkirk |
|---|---|---|---|---|---|
| Alice | 5 | 4 | 2 | 5 | 3 |
| Bob | 4 | 5 | 3 | 4 | 3 |
| Carol | 3 | 2 | 5 | 2 | 4 |
| David | 5 | — | 1 | 5 | ? |
David hasn't rated Dunkirk. Since David ≈ Alice and Bob (Cosine Sim > 0.95), we predict he'll rate it ~3/5.
🔗 Item-Based Collaborative Filtering
My second slide illustrates this beautifully. Instead of asking "who is similar to User D?", Item-Based CF asks: "which items are similar to items User D already liked?" In the diagram, Items A and C have a similarity score of 0.99 — so User D's predicted rating for Item E is derived from this high similarity.
Item-Based CF (popularised by Amazon in 2003) tends to be more scalable than User-Based CF because item relationships are more stable over time — users' tastes change, but an action movie's similarity to another action movie doesn't.
Decomposing Preferences into Hidden Factors
This is arguably the most elegant idea in all of recommendation system theory. The core observation: a real rating matrix R (Users × Items) is huge and sparse — most entries are missing. We can approximate it as the product of two smaller matrices:
R = (m × n) rating matrix
P = (m × k) user latent factor matrix
Q = (n × k) item latent factor matrix
k = number of latent factors (e.g. 10–200)
Objective: minimise Σ (rᵢⱼ − pᵢ·qⱼᵀ)² + λ(‖P‖² + ‖Q‖²)
Green = "action preference", Purple/Pink = "drama preference". The k=2 latent factors capture hidden taste dimensions.
The beauty here: the k latent factors aren't pre-defined — they emerge from the data. One factor might represent "preference for action films", another "preference for cerebral narratives". The algorithm discovers these patterns on its own through gradient descent optimisation (SVD, ALS, or SGD).
If You Liked That, You'll Love This
Content-Based Filtering takes a completely different approach. Instead of relying on what other users think, it focuses entirely on the properties of items themselves. I think of it as asking: "What did I like about that item — and which other items share those qualities?"
🎵 The Spotify Example
Spotify's content-based engine analyses audio features of each song — tempo, key, danceability, energy, acousticness, valence — and builds a profile of what you enjoy. If I love high-energy tracks at 128 BPM in a minor key, it surfaces more like it, even by artists I've never heard of.
Similarity to a seed item computed using cosine distance on TF-IDF / feature vectors.
No cold-start problem for items. As soon as a new movie is catalogued with metadata (genre, cast, director, runtime), it can be recommended immediately — even before any user has rated it.
Content-based systems suffer from the "filter bubble" effect — they tend to recommend more of the same, limiting serendipitous discovery. They also fail when item metadata is sparse or missing.
The Best of Both Worlds
No single approach is perfect. In practice, all major platforms use hybrid systems that combine multiple techniques to cover each other's blind spots. Here's how I map the major hybrid strategies:
| Hybrid Strategy | How It Works | Used By |
|---|---|---|
| Weighted | Combine scores from CF and content-based linearly (e.g. 0.6×CF + 0.4×CB) | Pandora |
| Switching | Choose CF when sufficient data exists; fall back to CB for cold-start users | Netflix |
| Cascade | Use CF to generate candidates, then content-based to re-rank and filter | YouTube |
| Feature Augmentation | Feed content features as extra inputs into a CF model (e.g. Wide & Deep, Two-Tower) | Google, Meta |
| Meta-Level | Train a model that uses the output of one technique as features for another | Amazon |
Inside Netflix's Recommendation Engine
Netflix — The World's Most Studied Recommender
Personalisation at 270 million subscriber scale
⚙️ How Netflix's System Actually Works
Netflix doesn't use just one algorithm — it's a layered ensemble of specialised models working together. Here's how I understand the architecture:
1. Candidate Generation: Multiple candidate generators run in parallel — a Matrix Factorisation model (ALS on implicit ratings), a Row-Based Neural Network, and a Two-Tower deep learning model that separately encodes user context and item features. Each generates thousands of candidate titles.
2. Ranking: A gradient-boosted tree model or neural ranker scores all candidates together, using hundreds of features — play duration, device type, time of day, country, user's current "session mood", and historical completion rates.
3. Page Assembly: Netflix's homepage isn't just a list — it's a grid of rows, each algorithmically generated and titled (e.g. "Because you watched Dark"). The row ordering and the thumbnail image shown are both personalised per user.
Netflix runs multi-armed bandit experiments on artwork. If you loved action films, you see the action-heavy thumbnail for a movie. A romance fan sees the same movie with a different scene. I find this the most underrated part of their personalisation stack.
4. Diversity Injection: A post-ranking step forces diversity into the final recommendations to avoid the filter bubble — ensuring genres, languages, and content types are sufficiently varied.
From Academic Paper to Plain English
Amazon.com Recommendations: Item-to-Item Collaborative Filtering
This landmark paper introduced the item-to-item CF algorithm that powers Amazon's "Customers who bought this also bought..." — one of the most commercially successful applications of recommender systems ever built.
Traditional User-Based CF at Amazon's scale (millions of users, millions of items) was computationally infeasible in real-time. Computing all pairwise user similarities on demand was O(n²) — impossible to do for a user the instant they land on a product page.
💡 The Elegant Solution — Precompute Item Similarities
The genius insight: instead of computing user similarities at query time, precompute all item-to-item similarities offline. Item relationships are far more stable than user preferences — a camera lens is always similar to other camera lenses. This computation can happen in a batch job every few hours, not in the milliseconds of a user request.
At serving time, the algorithm simply looks up: "What items are most similar to the items this user has purchased/rated?" This is an O(1) lookup rather than an O(n²) computation. The result? Recommendations that scale to hundreds of millions of users with sub-millisecond latency.
What I find most insightful about this 2003 paper is that it didn't win because of better maths — the algorithm itself is straightforward. It won because the authors asked "how do we make this work at real-world scale?" Engineering pragmatism beat theoretical elegance. That lesson applies to almost every ML system I'll ever build.
What Makes This Hard in the Real World
❄️ The Cold Start Problem
When a brand new user signs up, there's no historical data to base recommendations on. Similarly, when a new item is added, no one has rated it yet. This "cold start" problem is one of the central research challenges in the field.
Platforms handle cold start through: onboarding surveys (Netflix asks your favourite genres), demographic-based priors, content-based fallback, and knowledge graph embeddings that can bootstrap recommendations from item metadata alone.
🕳️ Data Sparsity
In any real rating matrix, more than 99% of entries are empty. A user who has rated 50 movies out of 10,000 in the catalogue creates an extremely sparse vector. Matrix Factorization and deep learning approaches are specifically designed to handle this.
🗓️ Temporal Dynamics
User preferences drift over time. My movie taste at 18 is very different from my taste now. Models like TimeSVD++ and sequence-aware recommenders (RNNs, Transformers) explicitly model these temporal shifts in preferences.
How Do We Know If It's Working?
Evaluating a recommendation system is genuinely hard — "good" recommendations are subjective, contextual, and partially unmeasurable. I categorise evaluation into offline, online, and qualitative dimensions:
Measures average prediction error on known ratings. Lower is better. The Netflix Prize used this as the primary metric.
Of the top K items recommended, what fraction are actually relevant? Critical for homepage carousels.
Of all relevant items, what fraction appear in the top K? Balances with Precision via the F1 score.
Normalised Discounted Cumulative Gain. Penalises relevant items appearing further down the ranked list.
🧪 Online Evaluation — A/B Testing
Offline metrics don't always translate to real-world performance. The gold standard in industry is A/B testing: split users into control (current system) and treatment (new algorithm) groups, then measure click-through rate, watch time, or retention. Netflix runs hundreds of A/B tests simultaneously.
A system that recommends only highly-rated superhero films to everyone maximises precision but destroys diversity. Modern evaluation also measures Intra-List Diversity (how varied the recommendations are) and Serendipity (how surprising and delightful an unexpected discovery is). These cannot be captured by RMSE alone.
What I've Learned — and Why It Matters
Recommendation systems are one of the most commercially impactful applications of machine learning in the world. They sit at the intersection of mathematics, psychology, engineering, and ethics. From the elegant geometry of cosine similarity to the industrial-scale Two-Tower neural architectures running at Google — every concept traces back to one fundamental question: how do we understand human preferences and serve people what they truly need? As I continue this ML journey, I'm most excited about where deep learning, LLMs, and personalisation are converging — the next generation of recommenders won't just know what you want to watch; they'll understand why.
Comments
Post a Comment