Pairwise Ranking Problem: ML Interview
I got this problem from the Leetcode Discussion. The link of the discussion is: ml-interview-discussion
I guess you have read that and attempted once... No issue. First, a fall, don't fear, it is related to an ML system design problem.
PS: YOUR APPROACH CAN BE DIFFERENT.
Hint: It is the pair-wise comparison problem (cyclic, conflicting)... Yes, related to graph methods towards the stats. Try Again.... by yourself....
Let's break down,
1. What's the Problem??
You have many “likes” of the form
User U says: Video A > Video B
But:
- There may be loops like A > B, B > C, C > A (no single ordering fits them all).
- Different users may disagree (User 1 says A > B, User 2 says B > A).
Your goal: take all these noisy, possibly cyclic, pairwise signals and produce for each user a ranked list of videos from “most preferred” down to “least preferred.”
2. Why does a plain graph algorithm fail?
- Topological sort only works on a Directed Acyclic Graph (DAG).
- Cycles break the DAG assumption; topo‑sort won’t produce anything.
- Conflicts across users mean there is no single set of “true” edges you can trust absolutely.
We need to switch from “find a perfect order” to “learn a score for each video, a higher score means more likely preferred.”
3. The core idea: turn comparisons into scores
Instead of forcing a strict ordering, we aim to learn a real-valued score for each video i. Then we recommend by sorting videos in descending order of .
Key benefit: even if sometimes and other times, the best-fit scores will place A and B somewhere on the real line, maybe: = 0.8 and = 0.7, reflecting that overall A tends to beat B.
4. Basic statistical model: the Bradley–Terry (BT) model
This is often the first thing you learn in “learning from pairwise comparisons.”
- 4.1 Assumption: Each video has a positive “strength” . The probability that video beats video is:
- 4.2 Learning: Given all observed wins and losses across all users, we pick the values that maximize the likelihood. In practice, we optimize the following objective using a simple iterative procedure:
- 4.3 Recommendation: Translate strengths to scores:
For each user, show videos in order of decreasing .
5. Personalization: modeling each user
5.1: Simple User Bias:
We model the probability as:
Since appears in both the numerator and the denominator, it cancels out:
Conclusion: This simple bias doesn’t change the ranking; it’s equivalent to the global model.
5.2: Latent-Factor Personalization (BPR): A more powerful approach is to embed users and items in the same low-dimensional space:
- Each user
- Each video
Define a personalized score:
Model the probability that user prefers video over video as:
We train the user and item embeddings by maximizing the pairwise log-likelihood over all observed preference triples :
Inference: For a given user , compute for every video , and sort videos in descending order of score.
This method is called Bayesian Personalized Ranking (BPR).
6. Adding side information with a neural ranker
If you have extra features, like video metadata, frame embeddings, or user demographics, you can build a small neural network to model preferences.
6.1 Inputs
- Video feature vectors: ,
- User feature vector:
6.2 Model
Compute a combined representation:
Pass through a multi-layer perceptron (MLP) that outputs a single real number (the preference score for user preferring over ).
6.3 Loss
We use the same pairwise logistic loss:
Where is the sigmoid function.
6.4 Recommendation
To score each pair, we can:
- Compare against a fixed "anchor" video
- Or train a separate pointwise head that directly predicts a relevance score
Sort videos for each user by that score.
Note: This method is more flexible (can capture complex interactions), but it also requires more data and careful regularization.
7. Putting It into Practice: A Beginner’s Recipe
7.1 Data Preparation
- Gather all user comparisons as a list of triples , where user prefers video over .
7.2 Start Simple: Global Bradley-Terry (BT)
- Implement the BT model using existing libraries or a few lines of Python (e.g., using iterative updates or maximum likelihood).
- Inspect the resulting video strengths .
- Plot a histogram of to understand the distribution of scores.
7.3 Move to Personalization: Bayesian Personalized Ranking (BPR)
- Use a lightweight library like
implicit
in Python, or implement your own small stochastic gradient descent (SGD) loop. - Choose a latent dimension .
- Train for a few epochs and monitor held-out accuracy, e.g., how often your model correctly predicts a withheld comparison.
7.4 Add Features & Upgrade to a Neural Model
Extract basic video features (e.g., TF-IDF on titles, or pretrained CNN frame embeddings).
Build a small MLP in TensorFlow or PyTorch to take in features:
- User features:
- Video features: ,
- Input vector:
Train the model with the same pairwise logistic loss:
7.5 Evaluate
Offline:
- Accuracy on held-out comparison pairs
- AUC of pairwise predictions
Online (when possible):
- A/B test your recommendations against a baseline system
7.6 Deploy & Iterate
- Serve scores in real-time by computing on the fly. This is efficient if your model (especially BPR or MLP) is small and fast.
- Continuously collect new comparisons.
- Periodically retrain:
- BT and BPR models are cheap to update incrementally.
Bottom line for a beginner:
Step 1: Learn and implement Bradley-Terry for global ranking.
Step 2: Add BPR matrix factorization to personalize.
Step 3 (optional): Build a neural pairwise ranker if you have rich features.
Small Implementation
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
# ====== Bradley-Terry Model ======
class BradleyTerry:
def __init__(self, n_items, lr=0.1, max_iter=1000, tol=1e-6):
self.n = n_items
self.lr = lr
self.max_iter = max_iter
self.tol = tol
self.w = np.ones(n_items) # initial strengths
def fit(self, pairs):
"""
pairs: list of (i, j) meaning i beats j
"""
w = self.w.copy()
for it in range(self.max_iter):
w_prev = w.copy()
# gradient ascent on log-likelihood
grad = np.zeros_like(w)
for i, j in pairs:
grad[i] += 1 - w[i] / (w[i] + w[j])
grad[j] += 0 - w[j] / (w[i] + w[j])
w += self.lr * grad
# keep positive
w = np.clip(w, 1e-8, None)
if np.linalg.norm(w - w_prev) < self.tol:
break
self.w = w
return self
def scores(self):
# return log-strengths as scores
return np.log(self.w)
# ====== BPR Matrix Factorization ======
class BPR(nn.Module):
def __init__(self, n_users, n_items, emb_dim=16):
super().__init__()
self.user_emb = nn.Embedding(n_users, emb_dim)
self.item_emb = nn.Embedding(n_items, emb_dim)
nn.init.normal_(self.user_emb.weight, std=0.01)
nn.init.normal_(self.item_emb.weight, std=0.01)
def forward(self, u, i, j):
# returns predicted preference difference s(u,i) - s(u,j)
u_vec = self.user_emb(u)
i_vec = self.item_emb(i)
j_vec = self.item_emb(j)
return (u_vec * (i_vec - j_vec)).sum(dim=1)
def predict_scores(self, u):
# compute scores for all items for user u
with torch.no_grad():
u_vec = self.user_emb(u.unsqueeze(1))
all_items = self.item_emb.weight
scores = (u_vec * all_items).sum(dim=2).squeeze(1)
return scores
# ====== Example Usage ======
if __name__ == '__main__':
# Sample data: 4 videos, synthetic comparisons
pairs = [(0,1), (0,2), (1,2), (2,3), (3,1), (3,0)]
bt = BradleyTerry(n_items=4)
bt.fit(pairs)
print("Bradley-Terry scores:", bt.scores())
# Synthetic user comparisons: user 0 prefers 0>1,1>2; user1:2>1,2>0
user_items = [(0,0,1), (0,1,2), (1,2,1), (1,2,0)]
n_users, n_items = 2, 3
device = torch.device('cpu')
model = BPR(n_users, n_items, emb_dim=8).to(device)
optimizer = optim.Adam(model.parameters(), lr=0.01)
loss_fn = nn.LogSigmoid()
# Training loop
for epoch in range(100):
total_loss = 0
for u,i,j in user_items:
u_t = torch.LongTensor([u]).to(device)
i_t = torch.LongTensor([i]).to(device)
j_t = torch.LongTensor([j]).to(device)
x = model(u_t, i_t, j_t)
loss = -loss_fn(x).mean()
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
if epoch % 20 == 0:
print(f"Epoch {epoch}: loss={total_loss:.4f}")
# Inference for user 0
u0 = torch.LongTensor([0]).to(device)
scores = model.predict_scores(u0)
print("BPR scores for user 0:", scores)
I sincerely thank you for your time and interest in this work.
~ Vayishu (Vishal S Pandey)