
Table of Contents
- 1. What is machine learning?
- 2. Generalization is what counts
- 3. The curse of dimensionality
- 4. More data beats a cleverer algorithm
- 5. Learn many models, not just one
- 6. Types of ML learning systems
- 7. The six main challenges
- 8. Train, validate, test — and the right way to evaluate
- 9. scikit-learn and hyperparameter tuning
- 10. Take-home points
Last update: June 2024. All opinions are my own.
Machine Learning from Scratch · Part 1/8
This is Session 1 of my Machine Learning II course, the way I actually wrote it down. The page images below are scans of my real notebook — I'm leaving them in because the diagrams and emphasis are where most of the intuition lives. The short typed lead-ins exist so the post is searchable and skimmable; the depth is in the pages themselves.
📄 Prefer the raw notes? Download the original PDF (12 pages).
What is machine learning?
Tom Mitchell's definition is still the cleanest one: a computer program learns from experience E with respect to a task T and a performance measure P, if its performance on T, as measured by P, improves with E. A spam filter is the textbook example — task: flag spam; experience: the examples users mark; measure: classification accuracy.


The three pieces are: Representation (the kind of model — linear regression, decision tree, neural net…), Evaluation (the metric that decides what counts as a good model), and Optimization (how the algorithm actually searches for it). Most of this series is really about different choices for the first two.
Generalization is what counts
The first big idea: it doesn't matter how well your model fits the data you have. What matters is how it behaves on data it hasn't seen. Overfitting is when training error is tiny but new-data error explodes; underfitting is when the model is too simple to capture the pattern at all.

Cross-validation is the practical defence: split the training data into k folds, hold out one at a time while training on the rest, and average the scores. Bigger and cleaner datasets push the sweet spot further to the right.
The curse of dimensionality
Adding features sounds like more information, but only if they actually carry signal. If they don't, you've just made the problem harder: each new feature is a new dimension, the search space explodes, and the same number of points becomes sparser.

There's also a counter-effect — the blessing of non-uniformity — because real data isn't spread uniformly across the space; it tends to live on a lower-dimensional manifold. Good feature engineering is what makes that show up.
More data beats a cleverer algorithm
If you only remember one thing from this session: more good data almost always beats a fancier model. Four different algorithms converge to similar accuracy once you feed them enough examples. But "good" matters — garbage in, garbage out.

The four ingredients of any ML problem: a defined objective (what outcome am I trying to achieve?), levers (what inputs we can control), data (what we can collect), and models (how levers map to objective).
Learn many models, not just one
There is no single best model — the no-free-lunch theorem. Different problems suit different representations, and the strongest practical move is usually to combine several models rather than pick a winner.

Concept of Ensemble Decision Boundary
Ensembles are also a defence against overfitting: train several diverse models, combine them, and the random errors tend to cancel while the real signal reinforces.
If you only remember one thing from this section: LEARN MANY MODELS, NOT JUST ONE. Combining diverse models almost always beats picking a single winner.
Types of ML learning systems
ML systems get sorted three ways: by how much human supervision they get (supervised, unsupervised, semi-supervised, reinforcement), by whether they learn incrementally (batch vs online), and by how they generalize (instance-based vs model-based).

Reinforcement learning is the one with the most distinct framing: an agent observes an environment, picks an action, gets a reward, and slowly learns a policy that maximises long-term reward.
The six main challenges
Most of the practical pain in ML projects comes from one of six places: insufficient data, non-representative training data (sampling noise + sampling bias), poor-quality data, irrelevant features, overfitting, and underfitting.

The fixes mirror each other: more or better data tackles the first three; feature engineering tackles irrelevant features; regularization (controlled by a hyperparameter) tackles overfitting; a more powerful model or fewer constraints tackles underfitting.
Train, validate, test — and the right way to evaluate
A single split is not enough. Split your data three ways: training (fit the model), validation (decide which hyperparameters to use), and test (one final, untouched performance check you're not allowed to optimise against).

Four metrics worth knowing cold: accuracy (when classes are balanced), recall (fraction of true positives you actually caught), precision (fraction of predicted positives that are real), and F1 (the harmonic mean of the two — it punishes extreme values harder than the average would).
scikit-learn and hyperparameter tuning
Every scikit-learn algorithm follows the same six-step pattern: import, set hyperparameters, split, fit, predict, evaluate. Once that pattern is internalised, every algorithm in this series looks the same from the outside.

The catch with hyperparameter tuning: if you tune by checking your test set, the test set is no longer a fair estimate of generalization. The fix is a validation set — or, better, k-fold cross-validation, which averages the validation score across many small folds and gives a more reliable read.
Take-home points
- Beware of overfitting — your model must work well on data it hasn't seen.
- Feature-engineer against the curse of dimensionality — too many irrelevant features hurts; select what matters.
- More features aren't always good, but more (good) data almost always is.
- Combine data with expertise — domain knowledge from people who understand the problem.
- Ensemble many different models — diverse models combined beat any single one.
That's the whole foundation of the field, in one session. Next we get our hands dirty with the unglamorous step that decides whether any of this even has a chance: cleaning and understanding data.
📄 Download the full Session 1 PDF if you'd rather read it all in my handwriting.
