Vishal Pandey | Applied ML Research

Pairwise Ranking Problem: ML Interview

I got this problem from the Leetcode Discussion. The link of the discussion is: ml-interview-discussion

I guess you have read that and attempted once... No issue. First, a fall, don't fear, it is related to an ML system design problem.

PS: YOUR APPROACH CAN BE DIFFERENT.

Hint: It is the pair-wise comparison problem (cyclic, conflicting)... Yes, related to graph methods towards the stats. Try Again.... by yourself....

Let's break down,

1. What's the Problem??

You have many “likes” of the form

User U says: Video A > Video B  

But:

Your goal: take all these noisy, possibly cyclic, pairwise signals and produce for each user a ranked list of videos from “most preferred” down to “least preferred.”

2. Why does a plain graph algorithm fail?

We need to switch from “find a perfect order” to “learn a score for each video, a higher score means more likely preferred.”

3. The core idea: turn comparisons into scores

Instead of forcing a strict A>B ordering, we aim to learn a real-valued score si for each video i. Then we recommend by sorting videos in descending order of si.

Key benefit: even if A>B sometimes and B>A other times, the best-fit scores will place A and B somewhere on the real line, maybe: sA = 0.8 and sB = 0.7, reflecting that overall A tends to beat B.

4. Basic statistical model: the Bradley–Terry (BT) model

This is often the first thing you learn in “learning from pairwise comparisons.”

P(ij)=wiwi+wj (ij)log(wiwi+wj) si=logwi

For each user, show videos in order of decreasing si.

5. Personalization: modeling each user

5.1: Simple User Bias:

We model the probability as:

Pu(ij)=exp(wi+bu)exp(wi+bu)+exp(wj+bu)

Since bu appears in both the numerator and the denominator, it cancels out:

Pu(ij)=wiwi+wj

Conclusion: This simple bias doesn’t change the ranking; it’s equivalent to the global model.

5.2: Latent-Factor Personalization (BPR): A more powerful approach is to embed users and items in the same low-dimensional space:

Define a personalized score:

su,i=uvi

Model the probability that user u prefers video i over video j as:

Pu(ij)=σ(su,isu,j)=11+e(su,isu,j)

We train the user and item embeddings {u,v} by maximizing the pairwise log-likelihood over all observed preference triples (u,ij):

(u,ij)logσ(su,isu,j)

Inference: For a given user u, compute su,i for every video i, and sort videos in descending order of score.

This method is called Bayesian Personalized Ranking (BPR).

6. Adding side information with a neural ranker

If you have extra features, like video metadata, frame embeddings, or user demographics, you can build a small neural network to model preferences.

6.1 Inputs

6.2 Model

Compute a combined representation:

ϕ(u,i,j)=[zu,xi,xj,xixj]

Pass ϕ through a multi-layer perceptron (MLP) hθ(ϕ) that outputs a single real number (the preference score for user u preferring i over j).

6.3 Loss

We use the same pairwise logistic loss:

L=logσ(hθ(ϕ(u,i,j)))

Where σ is the sigmoid function.

6.4 Recommendation

To score each (u,i) pair, we can:

Sort videos for each user by that score.

Note: This method is more flexible (can capture complex interactions), but it also requires more data and careful regularization.

7. Putting It into Practice: A Beginner’s Recipe

7.1 Data Preparation

7.2 Start Simple: Global Bradley-Terry (BT)

7.3 Move to Personalization: Bayesian Personalized Ranking (BPR)

7.4 Add Features & Upgrade to a Neural Model

L=(u,ij)logσ(hθ(ϕ(u,i,j)))

7.5 Evaluate

7.6 Deploy & Iterate


Bottom line for a beginner:
Step 1: Learn and implement Bradley-Terry for global ranking.
Step 2: Add BPR matrix factorization to personalize.
Step 3 (optional): Build a neural pairwise ranker if you have rich features.


Small Implementation

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

# ====== Bradley-Terry Model ======
class BradleyTerry:
    def __init__(self, n_items, lr=0.1, max_iter=1000, tol=1e-6):
        self.n = n_items
        self.lr = lr
        self.max_iter = max_iter
        self.tol = tol
        self.w = np.ones(n_items)  # initial strengths

    def fit(self, pairs):
        """
        pairs: list of (i, j) meaning i beats j
        """
        w = self.w.copy()
        for it in range(self.max_iter):
            w_prev = w.copy()
            # gradient ascent on log-likelihood
            grad = np.zeros_like(w)
            for i, j in pairs:
                grad[i] += 1 - w[i] / (w[i] + w[j])
                grad[j] += 0 - w[j] / (w[i] + w[j])
            w += self.lr * grad
            # keep positive
            w = np.clip(w, 1e-8, None)
            if np.linalg.norm(w - w_prev) < self.tol:
                break
        self.w = w
        return self

    def scores(self):
        # return log-strengths as scores
        return np.log(self.w)

# ====== BPR Matrix Factorization ======
class BPR(nn.Module):
    def __init__(self, n_users, n_items, emb_dim=16):
        super().__init__()
        self.user_emb = nn.Embedding(n_users, emb_dim)
        self.item_emb = nn.Embedding(n_items, emb_dim)
        nn.init.normal_(self.user_emb.weight, std=0.01)
        nn.init.normal_(self.item_emb.weight, std=0.01)

    def forward(self, u, i, j):
        # returns predicted preference difference s(u,i) - s(u,j)
        u_vec = self.user_emb(u)
        i_vec = self.item_emb(i)
        j_vec = self.item_emb(j)
        return (u_vec * (i_vec - j_vec)).sum(dim=1)

    def predict_scores(self, u):
        # compute scores for all items for user u
        with torch.no_grad():
            u_vec = self.user_emb(u.unsqueeze(1))               
            all_items = self.item_emb.weight                   
            scores = (u_vec * all_items).sum(dim=2).squeeze(1)
        return scores

# ====== Example Usage ======
if __name__ == '__main__':
    # Sample data: 4 videos, synthetic comparisons
    pairs = [(0,1), (0,2), (1,2), (2,3), (3,1), (3,0)]
    bt = BradleyTerry(n_items=4)
    bt.fit(pairs)
    print("Bradley-Terry scores:", bt.scores())

    # Synthetic user comparisons: user 0 prefers 0>1,1>2; user1:2>1,2>0
    user_items = [(0,0,1), (0,1,2), (1,2,1), (1,2,0)]
    n_users, n_items = 2, 3
    device = torch.device('cpu')
    model = BPR(n_users, n_items, emb_dim=8).to(device)
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    loss_fn = nn.LogSigmoid()

    # Training loop
    for epoch in range(100):
        total_loss = 0
        for u,i,j in user_items:
            u_t = torch.LongTensor([u]).to(device)
            i_t = torch.LongTensor([i]).to(device)
            j_t = torch.LongTensor([j]).to(device)
            x = model(u_t, i_t, j_t)
            loss = -loss_fn(x).mean()
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        if epoch % 20 == 0:
            print(f"Epoch {epoch}: loss={total_loss:.4f}")

    # Inference for user 0
    u0 = torch.LongTensor([0]).to(device)
    scores = model.predict_scores(u0)
    print("BPR scores for user 0:", scores)

I sincerely thank you for your time and interest in this work.

~ Vayishu (Vishal S Pandey)