Module 2 of 9

Module 2: How LLMs Process Language

Why this matters

Once you understand how a model “reads” your prompt and “writes” its response, every other lesson in this course makes more sense. This module gives you the mental model that powers good prompting.

Tokens — the AI’s unit of language

AI models don’t see words. They see tokens. A token is roughly:

A rough rule: 1 token ≈ 0.75 words. So 1,000 tokens ≈ 750 words ≈ a long email.

Why this matters:

Context windows

The context window is the maximum amount of text a model can process at once — your prompt plus its response, plus any uploaded files or chat history.

Modern context windows (as of 2026):

What this means practically:

A practical habit: when a conversation gets long or you’ve finished a major task, either condense it or start fresh.

Old context bloats every new prompt with the model’s earlier reasoning — which costs more tokens and increases drift. A clean (or condensed) conversation produces sharper output.

How models are trained (high level)

You don’t need to be technical, but a quick mental model helps. Training happens in stages — pretraining is the heavy lift, and the next two stages are layered on top of the pretrained model afterward to shape behavior:

  1. Pretraining (the main training) — The model reads a huge portion of the internet (books, articles, websites, code). It learns patterns: which words tend to follow which, how arguments are structured, how code is written. This takes months and millions of dollars. At the end of pretraining, you have a “base model” — knowledgeable but raw, not yet useful as a chatbot.

  2. Fine-tuning (post-training) — The base model is then refined for specific tasks (chat, coding, safety). This shapes its behavior — making it helpful, harmless, and honest. It happens after pretraining, on top of the existing model.

  3. RLHF (Reinforcement Learning from Human Feedback) (post-training) — Humans rate model outputs. The model learns to produce responses humans prefer. Also layered on after pretraining. This is why modern AI feels conversational instead of robotic.

The model you talk to is the result of all three stages. It’s not trained on what you tell it — it’s frozen at whatever date its training cut off.

Why prompt structure changes output quality

Because the model predicts the next token based on everything before it, the way you structure your prompt directly shapes the answer.

A prompt like “tell me about marketing” gives the model almost no signal. Tens of thousands of marketing-related sentences could plausibly follow. The model picks something generic.

A prompt like “Explain the difference between content marketing and paid advertising for a small bakery in 200 words, in plain language, with one example each” narrows the possibility space dramatically. The model has clear constraints and produces a focused, useful answer.

Specificity is leverage. Every detail you add eliminates entire categories of bad responses.

The two modes — instant vs thinking

Modern models like ChatGPT-5.1, Claude Opus, and Gemini 3 have two modes:

Instant — Fast default response. Best for:

Thinking (also called “Reasoning” or “Extended Thinking”) — The model spends more compute internally before answering. Best for:

Important: More thinking isn’t always better. For simple tasks, thinking mode can produce worse results because the model overcomplicates things. Match the mode to the task.

Key takeaways

Quick Check

1. A token is best described as:

2. Roughly, 1,000 tokens equals about how many words?

3. Which stage of training happens first and is the heaviest lift?

4. Why does prompt structure change output quality so much?

5. When should you use "Thinking" / Reasoning mode?