TL;DR · Key Takeaways
- The jargon is not random. Almost every term belongs to one of two moments: building the model or using it.
- Build time gives you training, parameters, fine-tuning, and the labels LLM and foundation model.
- Use time gives you prompt, token, context, inference, and the failure mode everyone fears, hallucination.
- Embedding is the bridge: it is how words get turned into numbers so the math can happen at all.
The fastest way to feel lost in an AI conversation is the vocabulary. People throw around tokens, parameters, inference, and embeddings as if everyone agreed on them years ago. The good news is that the words are not as scattered as they sound. Nearly all of them describe one of two moments in a model’s life: the moment it is built, and the moment you use it. Sort each term into the right moment and the jargon stops being a wall and starts being a map. That sorting is what this part does.
The thing itself: model and parameters
Start with the model. We give it a full part of its own later, but the short version is that a model is a giant mathematical function that turns an input into a prediction. It is not a database of answers. It is a set of patterns squeezed into numbers.
Those numbers are the parameters. Think of a mixing desk in a studio with millions of dials. Each parameter is one dial, and the exact position of every dial is what makes this model behave the way it does. When you read that a model is “8 billion” or “70 billion”, that figure is the number of dials. More dials usually means more capacity to capture nuance, though it is a rough guide rather than a guarantee of quality. The whole point of building a model is to find good settings for all those dials, and that search has a name of its own.
Build time: training, fine-tuning, and the labels
Training is the process of setting those dials by example. The model is shown enormous amounts of text, makes a guess, gets told how wrong it was, and nudges its dials a little to be less wrong next time. Repeat that billions of times and the parameters settle into positions that capture real patterns of language. Training is slow, expensive, and happens once. It is the months of studio work before the album ships.
Fine-tuning is a shorter, more targeted round of the same thing. You take a model that already knows language broadly and train it a bit more on a narrow set of examples, say your company’s support tickets or a particular writing style. It does not rebuild the model, it adjusts it. We compare fine-tuning against other options in a later part, because people reach for it more often than they should.
That broad, pre-trained starting point has a name: a foundation model. It is a model trained on a wide sweep of general data so it can be pointed at many tasks rather than one. When the foundation model works on text, we usually call it a large language model, or LLM. So the relationship is simple: an LLM is a foundation model whose specialty is language. GPT-4o and Llama 3.1 are LLMs. A general image model is a foundation model that is not an LLM. The words stack rather than compete.
Use time: prompt, token, and context
Now the model is built and frozen. Everything from here is about feeding it. A prompt is simply the text you give it: your question, your instruction, the document you paste in. Whatever you type becomes the model’s starting point.
Before the model can work with your prompt, it chops the text into tokens. A token is a small chunk, often a whole word, sometimes part of one, sometimes just a punctuation mark. “Cat” is one token. “Unbelievable” might split into “un”, “believ”, and “able”. The model never sees letters or whole sentences, it sees a stream of these tokens, and almost everything about cost and length is measured in them.
The context is everything the model can hold in view at once: your prompt plus its own reply so far, all measured in tokens. It is the model’s short-term working memory. Once a conversation grows past the size of that window, the earliest parts fall out of view, which is why a long chat can seem to forget how it began. We devote a whole part to that window and its limits later.
Running it: inference, and why it costs
Inference is the word for actually running the model to get an answer. Training built the model, inference uses it. Every time you press enter, an inference happens: the model reads your tokens and generates new ones in reply. This matters more than it sounds, because training is a one-time bill while inference happens on every single request, forever. A later part is devoted entirely to that asymmetry, since it is where most of the real-world money goes.
One term sits between the words and the math and deserves a moment: the embedding. A model cannot do arithmetic on the word “cat” directly, so each token is turned into a list of numbers, a set of coordinates that places its meaning in a vast space. Words with similar meanings land near each other, which is how a model can tell that “king” and “queen” are related while “king” and “bicycle” are not. Embeddings are the translation layer that makes everything else possible, and they get their own part too.
The word for when it goes wrong: hallucination
The last term everyone trips over is hallucination. It is the name for when a model states something false with total confidence: a made-up citation, a wrong date, a quote nobody said. It is not lying, because lying needs a notion of truth to push against, and the model has none. It is doing exactly what it always does, producing a plausible next token, except this time plausible and true happen to part ways. Because the same machinery that writes a correct sentence writes the wrong one, the model gives you no warning. That is why hallucination gets its own dedicated part later, and why “check it against a real source” is the single most useful habit in this whole field.
▾ Go Deeper (optional, for technical readers)
People use parameters, weights, and activations loosely, but they are three different things. Weights (together with biases) are the learned numbers stored in the model. They are the parameters: fixed after training, identical for every request, and they are what the file on disk actually contains. A “7B model” has roughly 7 billion of them.
Activations are different. They are the intermediate values computed during a single forward pass, as your specific tokens flow through the layers. They are temporary, they differ for every prompt, and they exist only for the duration of that inference. A rough analogy: weights are the wiring of a calculator, activations are the numbers lighting up on the display while you press keys. This distinction matters for memory. Weights set the baseline footprint you need just to load the model, while activations, especially the growing key-value cache across a long context, drive how much extra memory each request consumes. Parts later in the series on the context window, on quantization, and on the memory wall all turn on exactly this split between what is stored once and what is computed every time.
New here? This is Part 2 of a 30-part walk from zero to the infrastructure behind production AI. If a term above felt rushed, the Generative AI Complete Guide maps out which later part covers it in full. New to the idea itself? Start with Part 1, what generative AI actually is.
The terms at a glance
| Term | Plain meaning | Stage |
|---|---|---|
| Model | A giant function that predicts the next token | Use |
| Parameters | The learned numbers (dials) inside the model | Build |
| Training | Setting the dials from examples | Build |
| Fine-tuning | Extra targeted training on a narrow set | Build |
| Foundation model | A broad model pointed at many tasks | Build |
| LLM | A foundation model specialised in language | Build |
| Prompt | The text you give the model | Use |
| Token | A small chunk of text the model reads | Use |
| Context | Everything the model can see at once | Use |
| Inference | Running the model to get an answer | Use |
| Embedding | A token turned into meaning-coordinates | Bridge |
| Hallucination | Confident output that is false | Use |
The Bottom Line
The vocabulary of generative AI is not a pile of unrelated buzzwords. It is two short stories. Building the model gives you training, parameters, fine-tuning, and the labels foundation model and LLM. Using the model gives you prompt, token, context, inference, and the risk of hallucination, with embeddings as the bridge that turns words into numbers in between. Keep that build-then-use split in your head and you can place almost any new term you meet. Next, we trace how the field actually got here, from rigid if-statements to ChatGPT. Which of these words had been tripping you up the most?
References
- Tokenizer: see how text splits into tokens (OpenAI)
- What are foundation models? (IBM Research)
- Embeddings: meaning as coordinates (Google Machine Learning)
« Part 1: what generative AI actually is | Generative AI Complete Guide | Next: Part 3, from if-statements to ChatGPT »








