The Lifecycle of an LLM

Large language models aren’t magic; they’re fast pattern machines trained at ridiculous scale. This visual walks through that lifecycle, from raw text to the responses we see in chat.

LLM Lifecycle

On the left is training. The model “reads” billions of pages of text: books, articles, code, documentation. It doesn’t know facts the way we do; it only sees long streams of symbols. That text is chopped into tokens and mapped to numbers so the network can process it efficiently.

For every position in a sentence, the model tries to guess the next token. It outputs a probability distribution: maybe “the” is most likely, then “a”, then “is”, and so on. We compare that guess with the actual next token, compute how wrong it was, and push that error back through the network. Tiny adjustments to billions of parameters make the model a little less wrong. Repeat that loop for ages across huge datasets and clusters, and you end up with a system that’s extremely good at next‑token prediction.

On the right is inference — what happens when you ask a question. Your prompt gets tokenized into the same kind of numerical IDs and sent through the trained model. The model produces next‑token probabilities again, but this time we sample a single token, append it to the prompt, and feed the updated sequence back in. Temperature and similar settings control how adventurous that sampling is. The model keeps doing this — predict, sample, append — until it emits an end‑of‑sequence token or hits a length limit.

What looks like flowing prose or step‑by‑step reasoning is just that loop, running at high speed, guided by a network that has absorbed an enormous amount of structure and pattern from its training run.