Recommendation systems are one of those engineering problems that feel larger than they are. Netflix, Spotify, Amazon — all built on variations of a few core ideas. This post walks through how I built one from first principles using PyTorch.
The core idea: matrix factorisation
At its simplest, a recommender is just a function: given a user and a catalogue of items, rank items by predicted preference. The cleanest way to do this is to learn latent embeddings for both users and items, then rank by dot-product similarity.
The model doesn't know what genre means. It learns that users who clicked A also clicked B, and infers a shared dimension from the pattern.
Setting up the training loop
The dataset is a user-item interaction matrix — rows are users, columns are items, values are implicit signals (clicks, plays, views). Negative sampling creates non-interaction pairs to train against.
class MatrixFactorisation(nn.Module):
def __init__(self, n_users, n_items, dim=64):
super().__init__()
self.user_emb = nn.Embedding(n_users, dim)
self.item_emb = nn.Embedding(n_items, dim)
def forward(self, u, i):
return (self.user_emb(u) * self.item_emb(i)).sum(1)
Results and reflections
After 20 epochs on 500K interactions, the model produces genuinely useful recommendations — much better than popularity-based baselines. The hardest part was not the model; it was getting clean interaction data.
3 Comments