Pairwise Ranking Problem: ML Interview

19 Jul, 2025

I got this problem from the Leetcode Discussion. The link of the discussion is: ml-interview-discussion

I guess you have read that and attempted once... No issue. First, a fall, don't fear, it is related to an ML system design problem.

PS: YOUR APPROACH CAN BE DIFFERENT.

Hint: It is the pair-wise comparison problem (cyclic, conflicting)... Yes, related to graph methods towards the stats. Try Again.... by yourself....

Let's break down,

1. What's the Problem??

You have many “likes” of the form

User U says: Video A > Video B

But:

There may be loops like A > B, B > C, C > A (no single ordering fits them all).
Different users may disagree (User 1 says A > B, User 2 says B > A).

Your goal: take all these noisy, possibly cyclic, pairwise signals and produce for each user a ranked list of videos from “most preferred” down to “least preferred.”

2. Why does a plain graph algorithm fail?

Topological sort only works on a Directed Acyclic Graph (DAG).
Cycles break the DAG assumption; topo‑sort won’t produce anything.
Conflicts across users mean there is no single set of “true” edges you can trust absolutely.

We need to switch from “find a perfect order” to “learn a score for each video, a higher score means more likely preferred.”

3. The core idea: turn comparisons into scores

Instead of forcing a strict $A > B$ ordering, we aim to learn a real-valued score $s_{i}$ for each video i. Then we recommend by sorting videos in descending order of $s_{i}$ .

Key benefit: even if $A > B$ sometimes and $B > A$ other times, the best-fit scores will place A and B somewhere on the real line, maybe: $s_{A}$ = 0.8 and $s_{B}$ = 0.7, reflecting that overall A tends to beat B.

4. Basic statistical model: the Bradley–Terry (BT) model

This is often the first thing you learn in “learning from pairwise comparisons.”

4.1 Assumption: Each video $i$ has a positive “strength” $w_{i}$ . The probability that video $i$ beats video $j$ is:

P (i ≻ j) = \frac{w_{i}}{w_{i} + w_{j}}

4.2 Learning: Given all observed wins and losses across all users, we pick the $w_{i}$ values that maximize the likelihood. In practice, we optimize the following objective using a simple iterative procedure:

\sum_{(i ≻ j)} \log (\frac{w_{i}}{w_{i} + w_{j}})

4.3 Recommendation: Translate strengths to scores:

s_{i} = \log w_{i}

For each user, show videos in order of decreasing $s_{i}$ .

5. Personalization: modeling each user

5.1: Simple User Bias:

We model the probability as:

P_{u} (i ≻ j) = \frac{\exp (w_{i} + b_{u})}{\exp (w_{i} + b_{u}) + \exp (w_{j} + b_{u})}

Since $b_{u}$ appears in both the numerator and the denominator, it cancels out:

P_{u} (i ≻ j) = \frac{w_{i}}{w_{i} + w_{j}}

Conclusion: This simple bias doesn’t change the ranking; it’s equivalent to the global model.

5.2: Latent-Factor Personalization (BPR): A more powerful approach is to embed users and items in the same low-dimensional space:

Each user $u \in ℝ^{d}$
Each video $v_{i} \in ℝ^{d}$

Define a personalized score:

s_{u, i} = u^{⊤} v_{i}

Model the probability that user $u$ prefers video $i$ over video $j$ as:

P_{u} (i ≻ j) = σ (s_{u, i} - s_{u, j}) = \frac{1}{1 + e^{- (s_{u, i} - s_{u, j})}}

We train the user and item embeddings ${u, v}$ by maximizing the pairwise log-likelihood over all observed preference triples $(u, i ≻ j)$ :

\sum_{(u, i ≻ j)} \log σ (s_{u, i} - s_{u, j})

Inference: For a given user $u$ , compute $s_{u, i}$ for every video $i$ , and sort videos in descending order of score.

This method is called Bayesian Personalized Ranking (BPR).

6. Adding side information with a neural ranker

If you have extra features, like video metadata, frame embeddings, or user demographics, you can build a small neural network to model preferences.

6.1 Inputs

Video feature vectors: $x_{i}$ , $x_{j}$
User feature vector: $z_{u}$

6.2 Model

Compute a combined representation:

ϕ (u, i, j) = [z_{u}, x_{i}, x_{j}, x_{i} - x_{j}]

Pass $ϕ$ through a multi-layer perceptron (MLP) $h_{θ} (ϕ)$ that outputs a single real number (the preference score for user $u$ preferring $i$ over $j$ ).

6.3 Loss

We use the same pairwise logistic loss:

L = - \sum \log σ (h_{θ} (ϕ (u, i, j)))

Where $σ$ is the sigmoid function.

6.4 Recommendation

To score each $(u, i)$ pair, we can:

Compare $i$ against a fixed "anchor" video
Or train a separate pointwise head $h_{θ}^{'} (z_{u}, x_{i})$ that directly predicts a relevance score

Sort videos for each user by that score.

Note: This method is more flexible (can capture complex interactions), but it also requires more data and careful regularization.

7. Putting It into Practice: A Beginner’s Recipe

7.1 Data Preparation

Gather all user comparisons as a list of triples $(u, i, j)$ , where user $u$ prefers video $i$ over $j$ .

7.2 Start Simple: Global Bradley-Terry (BT)

Implement the BT model using existing libraries or a few lines of Python (e.g., using iterative updates or maximum likelihood).
Inspect the resulting video strengths $w_{i}$ .
Plot a histogram of $\log w_{i}$ to understand the distribution of scores.

7.3 Move to Personalization: Bayesian Personalized Ranking (BPR)

Use a lightweight library like implicit in Python, or implement your own small stochastic gradient descent (SGD) loop.
Choose a latent dimension $d \approx 10$ .
Train for a few epochs and monitor held-out accuracy, e.g., how often your model correctly predicts a withheld comparison.

7.4 Add Features & Upgrade to a Neural Model

Extract basic video features (e.g., TF-IDF on titles, or pretrained CNN frame embeddings).
Build a small MLP in TensorFlow or PyTorch to take in $(u, i, j)$ features:
- User features: $z_{u}$
- Video features: $x_{i}$ , $x_{j}$
- Input vector: $ϕ (u, i, j) = [z_{u}, x_{i}, x_{j}, x_{i} - x_{j}]$
Train the model with the same pairwise logistic loss:

L = - \sum_{(u, i ≻ j)} \log σ (h_{θ} (ϕ (u, i, j)))

7.5 Evaluate

Offline:
- Accuracy on held-out comparison pairs
- AUC of pairwise predictions
Online (when possible):
- A/B test your recommendations against a baseline system

7.6 Deploy & Iterate

Serve scores in real-time by computing $s_{u, i}$ on the fly. This is efficient if your model (especially BPR or MLP) is small and fast.
Continuously collect new comparisons.
Periodically retrain:
- BT and BPR models are cheap to update incrementally.

Bottom line for a beginner:
Step 1: Learn and implement Bradley-Terry for global ranking.
Step 2: Add BPR matrix factorization to personalize.
Step 3 (optional): Build a neural pairwise ranker if you have rich features.

Small Implementation

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

# ====== Bradley-Terry Model ======
class BradleyTerry:
    def __init__(self, n_items, lr=0.1, max_iter=1000, tol=1e-6):
        self.n = n_items
        self.lr = lr
        self.max_iter = max_iter
        self.tol = tol
        self.w = np.ones(n_items)  # initial strengths

    def fit(self, pairs):
        """
        pairs: list of (i, j) meaning i beats j
        """
        w = self.w.copy()
        for it in range(self.max_iter):
            w_prev = w.copy()
            # gradient ascent on log-likelihood
            grad = np.zeros_like(w)
            for i, j in pairs:
                grad[i] += 1 - w[i] / (w[i] + w[j])
                grad[j] += 0 - w[j] / (w[i] + w[j])
            w += self.lr * grad
            # keep positive
            w = np.clip(w, 1e-8, None)
            if np.linalg.norm(w - w_prev) < self.tol:
                break
        self.w = w
        return self

    def scores(self):
        # return log-strengths as scores
        return np.log(self.w)

# ====== BPR Matrix Factorization ======
class BPR(nn.Module):
    def __init__(self, n_users, n_items, emb_dim=16):
        super().__init__()
        self.user_emb = nn.Embedding(n_users, emb_dim)
        self.item_emb = nn.Embedding(n_items, emb_dim)
        nn.init.normal_(self.user_emb.weight, std=0.01)
        nn.init.normal_(self.item_emb.weight, std=0.01)

    def forward(self, u, i, j):
        # returns predicted preference difference s(u,i) - s(u,j)
        u_vec = self.user_emb(u)
        i_vec = self.item_emb(i)
        j_vec = self.item_emb(j)
        return (u_vec * (i_vec - j_vec)).sum(dim=1)

    def predict_scores(self, u):
        # compute scores for all items for user u
        with torch.no_grad():
            u_vec = self.user_emb(u.unsqueeze(1))               
            all_items = self.item_emb.weight                   
            scores = (u_vec * all_items).sum(dim=2).squeeze(1)
        return scores

# ====== Example Usage ======
if __name__ == '__main__':
    # Sample data: 4 videos, synthetic comparisons
    pairs = [(0,1), (0,2), (1,2), (2,3), (3,1), (3,0)]
    bt = BradleyTerry(n_items=4)
    bt.fit(pairs)
    print("Bradley-Terry scores:", bt.scores())

    # Synthetic user comparisons: user 0 prefers 0>1,1>2; user1:2>1,2>0
    user_items = [(0,0,1), (0,1,2), (1,2,1), (1,2,0)]
    n_users, n_items = 2, 3
    device = torch.device('cpu')
    model = BPR(n_users, n_items, emb_dim=8).to(device)
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    loss_fn = nn.LogSigmoid()

    # Training loop
    for epoch in range(100):
        total_loss = 0
        for u,i,j in user_items:
            u_t = torch.LongTensor([u]).to(device)
            i_t = torch.LongTensor([i]).to(device)
            j_t = torch.LongTensor([j]).to(device)
            x = model(u_t, i_t, j_t)
            loss = -loss_fn(x).mean()
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        if epoch % 20 == 0:
            print(f"Epoch {epoch}: loss={total_loss:.4f}")

    # Inference for user 0
    u0 = torch.LongTensor([0]).to(device)
    scores = model.predict_scores(u0)
    print("BPR scores for user 0:", scores)

I sincerely thank you for your time and interest in this work.

~ Vayishu (Vishal S Pandey)