| Task | What it does |
|---|---|
| Question Answering | Natural-language queries over a dataset. |
| Machine Translation | Language A → language B (Google Translate, DeepL). |
| Text generation | Autocomplete, Smart Compose, full reports. |
| Fake news detection | Classify text as truthful or misleading. |
| Summarization | Long text → short text, meaning preserved. |
| Information extraction | Pull structured fields from free text. |
| Text classification | Spam vs not-spam, sentiment, intent. |
What NLP is
Definition. Automatic analysis and understanding of natural language for the communication between computers and humans.
- Natural language = how humans actually speak. Artificial language = Java, Python, SQL.
- NLP ≠ Neuro Linguistic Programming. Same letters, very different thing.
- NLP = AI + Computational Linguistics. Modern stack = ML / Deep Learning + CL.
- NLP = systems that understand language. NLP × ML = systems that learn to understand language.
What NLP solves
The underlying question: if I have information in natural language, can I use it to solve a business problem?
The 5 levels (the spine)
| # | Level | Question | Status |
|---|---|---|---|
| 1 | Morphology | what are the tokens? | 🟢 mostly solved |
| 2 | Syntax | how do words relate? | 🟢 mostly solved |
| 3 | Semantics | what do words together mean? | 🟡 making progress |
| 4 | Pragmatics | what's the speaker's intent? | 🔴 still really hard |
| 5 | Inference | what's true that wasn't said? | 🔴 still really hard |
Read it as a ladder. Every NLP task lives on one rung. You can't do rung n without rung n−1.
The example climbs the ladder
Sentence: "The dog is chasing the boy in the playground."
| Level | Output |
|---|---|
| 1 Morphology | `[The |
| 2 Syntax | chasing → dog (nsubj), chasing → boy (obj) |
| 3 Semantics | {agent: dog, action: chasing, patient: boy, location: playground} |
| 4 Pragmatics | "narrating a scene" |
| 5 Inference | "the boy is probably scared" |
The bottom is automation. The top is open research.
Why language is hard
Three reasons language is brutal for computers:
- Common sense is omitted. "help me with the window" — open it? close it? clean it? The speaker assumes you know.
- Productivity → combinatorial explosion. Limitless new sentences. You can't memorise; the model must generalise.
- Ambiguity is the killer. "I saw the man with the telescope." Two valid readings — humans pick one without noticing.
The fry-an-egg test. Explain it to someone who has never seen an egg. Now imagine explaining it without a shared language. That's NLP's grounding problem.
Why ML alone isn't enough
Deep learning works shockingly well — and has real cracks:
- Bias — if the corpus isn't curated, the model inherits and amplifies it.
- Black box — induction at scale, not logical reasoning. Answers, not explanations.
- No causal grounding — correlation patterns, not "because."
- No real-world semantics or pragmatics — the model has read about coffee, never tasted it.
- Frozen world model — BERT (trained 2018) doesn't know what COVID is. Can't infer after its cutoff.
The honest take: reasoning over text is exactly where current models still struggle.
The two traditions (and the pendulum)
Big explicit DB of facts/rules · IBM Watson · logical, slow to scale
Transformers · LLMs · "give me data, I'll learn anything" · black-box
- Historically knowledge-based was on top; now statistics is.
- The pendulum has swung before. It will swing again.
- Don't fire the linguist. The cognitive chasm — between pattern-matching text and actually understanding language — is still wide open.
Decision: where does your problem sit?
| Problem | Level it lives on |
|---|---|
| Tokenizer for Japanese / Chinese | 1 · Morphology |
| Parse a sentence to find subject/object | 2 · Syntax |
| Sentiment / classification / search ranking | 3 · Semantics |
| Translation between high-resource languages | 2 + 3 |
| Chatbot intent / dialogue | 4 · Pragmatics |
| Summarise medical reports + answer reasoning Q | 5 · Inference |
| Fake news / contradiction detection | 3 + 5 |
Rule of thumb. If the answer requires information that isn't in the text, you're at level 5. Lower your expectations or add a knowledge source.
