Cheat sheet

Part 12 · KNN & Recommender Systems — Cheat Sheet

The simplest non-parametric model. K-Nearest Neighbours for classification, regression, and as the core of memory-based recommender systems.

1

The algorithm

For each new point:

  1. Compute its distance to every training point.
  2. Pick the K closest neighbours.
  3. Classify → majority vote among the K labels.
  4. Regress → mean (or weighted mean) of the K target values.

No model is fit at training time — just stored. All the work happens at inference.

2

The K knob

K controls the bias-variance trade-off:

KBehaviour
K = 1Boundary follows every training point. High variance, overfits.
Small K (3–5)Local detail, can capture small clusters.
Large K (50+)Smooth boundary. High bias, underfits.
K = NAlways predicts the majority class. Useless.

Pick K by cross-validation. Use odd values for binary classification to avoid ties.

3

Why scale always

KNN uses Euclidean distance by default:

d(x, y) = sqrt(Σ (x_i − y_i)²)

If income ranges in 0–200,000 and age ranges in 0–80, the income axis dominates every distance. The model becomes "1-NN on income" and ignores age completely.

StandardScaler before KNN. Always. No exceptions.

Same applies to clustering (KMeans), PCA, and anything else distance-based.

4

Distance metrics

KNN supports many distance functions:

MetricUse for
Euclidean (L2)Default. Continuous features.
Manhattan (L1)High-dim or robust to outliers.
Minkowski (general)Tune p parameter (1=Manhattan, 2=Euclidean).
HammingBinary / categorical features.
CosineText vectors, recommender systems.

For text and recommenders, cosine similarity (1 − cosine distance) usually beats Euclidean.

5

Weighted KNN

Default: all K neighbours vote equally.

Weighted variant: weight each neighbour by 1 / distance — closer neighbours have more say.

KNeighborsClassifier(n_neighbors=5, weights="distance")

Helps when K is moderately large and you don't want distant outliers to dilute the vote.

6

When KNN wins

  • Low-dimensional, dense data. Distances are meaningful.
  • Decision boundary is locally smooth but globally complex.
  • You can't afford to train a model but inference latency isn't critical.
  • Recommender systems — user-user or item-item similarity.
  • Anomaly detection — points far from their K nearest neighbours are suspicious.
7

When KNN loses

  • High-dimensional data — curse of dimensionality. All points are "equally far". See Part 10 / PCA cheat sheet.
  • Massive datasets at inference time. O(n) per prediction. Use ANN libraries (FAISS, Annoy) or just pick another model.
  • Imbalanced classes. Majority class wins votes; minority class disappears.
  • You need interpretability beyond "these were its 5 nearest neighbours."
8

KNN as a recommender

Memory-based collaborative filtering:

User-user CF: "Find K users similar to you (by past ratings), recommend what they liked."

Item-item CF: "Find K items similar to this one (by who rated them), suggest those."

Steps:

  1. Build a user × item rating matrix.
  2. Normalise (mean-centre each row).
  3. Compute pairwise cosine similarities between users (or items).
  4. For each unseen (user, item): weighted average of similar users' ratings of that item.

Pros: simple, no model training. Cons: cold start (new users / items), poor scaling.

Modern systems combine KNN with matrix factorisation or embeddings.