How LLMs Work Under the Hood
We cover tokens, embeddings, and next-token prediction — no math required, but deep enough to understand model behavior.
In any LLM interface, ask the same question 3 times in a row without changes. Compare the responses — you'll see variability. If temperature is adjustable (e.g., OpenAI Playground), try temperature=0 and temperature=1 on the same prompt.
Copy and adapt to your context. Text in angle brackets should be replaced.
Continue the following text in three different ways — formal, conversational, and creative. Show all three versions: <paste your text opening here>
- Thinking the model has "memory" between tokens like a human — it only sees the current context window.
- Confusing temperature with "intelligence" — higher temperature doesn't make the model smarter, just more varied.
- Assuming 1 token = 1 word — this leads to incorrect estimates of limits and cost.
- For tasks with a single correct answer (code, SQL, JSON), use temperature=0 or close to it.
- For brainstorming, variations, stories — raise temperature to 0.8–1.0.
- Use tokenizer.openai.com to count tokens in a text — essential when working with large documents.
This is a theory lesson — understanding the mechanism helps you write better prompts and interpret model behavior correctly.
You don't need to hold the full mechanics in your head for every prompt — just keep the key implications in mind: tokens, context, temperature.