Cheat sheet

Part 1 · Introduction to NLP — Cheat Sheet

The 5-level ladder, the hard problems, and the field map. Designed for fast revision and quick lookup before the rest of the series.

Part 1 · Introduction to NLP — Cheat Sheet — printable cheat sheet
Download PNG

Or read the searchable version below.

1

What NLP is

Definition. Automatic analysis and understanding of natural language for the communication between computers and humans.

  • Natural language = how humans actually speak. Artificial language = Java, Python, SQL.
  • NLP ≠ Neuro Linguistic Programming. Same letters, very different thing.
  • NLP = AI + Computational Linguistics. Modern stack = ML / Deep Learning + CL.
  • NLP = systems that understand language. NLP × ML = systems that learn to understand language.
2

What NLP solves

TaskWhat it does
Question AnsweringNatural-language queries over a dataset.
Machine TranslationLanguage A → language B (Google Translate, DeepL).
Text generationAutocomplete, Smart Compose, full reports.
Fake news detectionClassify text as truthful or misleading.
SummarizationLong text → short text, meaning preserved.
Information extractionPull structured fields from free text.
Text classificationSpam vs not-spam, sentiment, intent.

The underlying question: if I have information in natural language, can I use it to solve a business problem?

3

The 5 levels (the spine)

#LevelQuestionStatus
1Morphologywhat are the tokens?🟢 mostly solved
2Syntaxhow do words relate?🟢 mostly solved
3Semanticswhat do words together mean?🟡 making progress
4Pragmaticswhat's the speaker's intent?🔴 still really hard
5Inferencewhat's true that wasn't said?🔴 still really hard

Read it as a ladder. Every NLP task lives on one rung. You can't do rung n without rung n−1.

4

The example climbs the ladder

Sentence: "The dog is chasing the boy in the playground."

LevelOutput
1 Morphology`[The
2 Syntaxchasingdog (nsubj), chasingboy (obj)
3 Semantics{agent: dog, action: chasing, patient: boy, location: playground}
4 Pragmatics"narrating a scene"
5 Inference"the boy is probably scared"

The bottom is automation. The top is open research.

5

Why language is hard

Three reasons language is brutal for computers:

  1. Common sense is omitted. "help me with the window" — open it? close it? clean it? The speaker assumes you know.
  2. Productivity → combinatorial explosion. Limitless new sentences. You can't memorise; the model must generalise.
  3. Ambiguity is the killer. "I saw the man with the telescope." Two valid readings — humans pick one without noticing.

The fry-an-egg test. Explain it to someone who has never seen an egg. Now imagine explaining it without a shared language. That's NLP's grounding problem.

6

Why ML alone isn't enough

Deep learning works shockingly well — and has real cracks:

  • Bias — if the corpus isn't curated, the model inherits and amplifies it.
  • Black box — induction at scale, not logical reasoning. Answers, not explanations.
  • No causal grounding — correlation patterns, not "because."
  • No real-world semantics or pragmatics — the model has read about coffee, never tasted it.
  • Frozen world model — BERT (trained 2018) doesn't know what COVID is. Can't infer after its cutoff.

The honest take: reasoning over text is exactly where current models still struggle.

7

The two traditions (and the pendulum)

Knowledge-based

Big explicit DB of facts/rules · IBM Watson · logical, slow to scale

Statistical / DL

Transformers · LLMs · "give me data, I'll learn anything" · black-box

  • Historically knowledge-based was on top; now statistics is.
  • The pendulum has swung before. It will swing again.
  • Don't fire the linguist. The cognitive chasm — between pattern-matching text and actually understanding language — is still wide open.
8

Decision: where does your problem sit?

ProblemLevel it lives on
Tokenizer for Japanese / Chinese1 · Morphology
Parse a sentence to find subject/object2 · Syntax
Sentiment / classification / search ranking3 · Semantics
Translation between high-resource languages2 + 3
Chatbot intent / dialogue4 · Pragmatics
Summarise medical reports + answer reasoning Q5 · Inference
Fake news / contradiction detection3 + 5

Rule of thumb. If the answer requires information that isn't in the text, you're at level 5. Lower your expectations or add a knowledge source.