# The AI Encyclopedia > A free, open, interactive encyclopedia of AI: machine learning from first > principles, the complete LLM field manual, prompting, and agent engineering — > with runnable Python and 70+ live instruments. https://ai-encyclopedia.com ## Volumes - [STATS · Probability](https://ai-encyclopedia.com/stats/01-probability.html): Probability gives degrees of belief an arithmetic, built on three axioms. Conditioning is the operation that turns prior belief into posteri - [STATS · Distributions](https://ai-encyclopedia.com/stats/02-distributions.html): A handful of named distributions account for most randomness in practice: coin flips, queue arrivals, measurement noise, market returns. Eac - [STATS · Correlation & Causation](https://ai-encyclopedia.com/stats/03-descriptive-correlation.html): Correlation measures how two variables move together. Moving from correlation to causation requires a causal model, not more data. This chap - [STATS · Statistical Inference & Hypothesis Testing](https://ai-encyclopedia.com/stats/04-inference-testing.html): You never observe the population, only a sample. Inference draws conclusions about the whole from the part, and it attaches a measure of its - [STATS · Bayesian Inference](https://ai-encyclopedia.com/stats/05-bayesian.html): Frequentist statistics treats a parameter as a fixed unknown and the data as random; Bayesian inference reverses this. A parameter becomes a - [STATS · Linear Algebra for Machine Learning](https://ai-encyclopedia.com/stats/06-linear-algebra.html): Beneath the framework and the GPU, almost every model in this encyclopedia is the same object: a stack of linear maps with some nonlinearity - [STATS · Markov Chains & MCMC](https://ai-encyclopedia.com/stats/07-markov-chains.html): A process that forgets its past, a property called memorylessness, is enough to model PageRank and language and to sample from distributions - [STATS · Information Theory](https://ai-encyclopedia.com/stats/08-information-theory.html): In 1948 Claude Shannon laid a foundation that still governs machine learning. He measured surprise as a number, entropy, and proved it is th - [DATA · The Data Problem](https://ai-encyclopedia.com/data/01-the-data-problem.html): Most experiments are decided before a single gradient is computed. A model is only as trustworthy as its data split, and the most expensive - [DATA · Missing Data & Imputation](https://ai-encyclopedia.com/data/02-missing-data.html): Real datasets arrive with holes, and the holes are rarely random. How a value went missing constrains how you may fill it, and naive mean-im - [DATA · Encoding, Scaling & Transforms](https://ai-encyclopedia.com/data/03-encoding-scaling.html): Models consume numbers, so the encoding of categories and the scaling of features often matters more than the choice of model. A linear mode - [DATA · Feature Engineering & Selection](https://ai-encyclopedia.com/data/04-feature-engineering.html): The right feature can let a linear model beat a neural net. Feature engineering is the point where domain knowledge enters the math. This ch - [DATA · Imbalanced Data](https://ai-encyclopedia.com/data/05-imbalanced.html): When 1 case in 1000 is the one that matters, as with fraud, disease, or default, accuracy stops being informative. Accuracy misleads under i - [VOL I · 01 · Learning from Data](https://ai-encyclopedia.com/ml/01-learning-from-data.html): Regression lines, neural networks, and trillion-parameter language models all run on one idea. Instead of writing the rules yourself, you wr - [VOL I · 02 · Linear Regression & Gradient Descent](https://ai-encyclopedia.com/ml/02-linear-regression.html): One model, a weighted sum. One loss, squared error. One algorithm, step downhill. Linear regression is the smallest setting in which the ful - [VOL I · 03 · Classification: Logistic & Softmax](https://ai-encyclopedia.com/ml/03-classification.html): Chapter 02 predicted numbers. Most tasks instead ask for a choice: spam or not, benign or malignant, which of 100,000 tokens comes next. The - [VOL I · 04 · Trees, Forests & Neighbors](https://ai-encyclopedia.com/ml/04-trees-and-neighbors.html): Not every model is a curve bent by gradient descent. This chapter covers methods that keep the training data instead of compressing it away: - [VOL I · 05 · Clustering & Dimensionality](https://ai-encyclopedia.com/ml/05-unsupervised.html): Every method so far was handed the right answers. This chapter takes them away. With no labels, the model must find structure the data carri - [VOL I · 06 · Generalization: Bias, Variance & Regularization](https://ai-encyclopedia.com/ml/06-generalization.html): Any model with enough knobs can score perfectly on data it has already seen. That is memorization, and it is worth nothing. The only error t - [VOL I · 07 · Neural Networks: The MLP](https://ai-encyclopedia.com/ml/07-neural-networks.html): A linear model can draw exactly one flat boundary, and four points are enough to defeat it. The fix is small. Stack two linear maps with a n - [VOL I · 08 · Backpropagation & Optimization](https://ai-encyclopedia.com/ml/08-backpropagation.html): Chapter 07 left a network with thirty-three knobs and one number telling it how wrong it is. This chapter is the algorithm that turns that o - [VOL I · 09 · Naive Bayes & Generative Classifiers](https://ai-encyclopedia.com/ml/09-naive-bayes.html): Most classifiers learn where to draw the line between classes. A generative classifier instead learns to model each class, then asks which c - [VOL I · 10 · Support Vector Machines & the Kernel Trick](https://ai-encyclopedia.com/ml/10-svm-kernels.html): Of all the lines that separate two classes, the SVM picks the one sitting in the widest empty corridor. Maximize that margin and a handful o - [VOL I · Distance & Similarity Metrics](https://ai-encyclopedia.com/ml/11-distances-similarity.html): k-NN, every clustering algorithm, and every vector search rest on one decision that usually goes unexamined: how you measure "close". That s - [VOL I · The Clustering Zoo](https://ai-encyclopedia.com/ml/12-clustering-zoo.html): k-means is fast and simple, but its squared-distance-to-a-centre objective can only carve the plane into round, equal, convex blobs. Hand it - [VOL I · 13 · Matrix Factorization & SVD](https://ai-encyclopedia.com/ml/13-matrix-factorization.html): A ratings table, a term-document count, a pixel grid, an adjacency matrix: most large matrices that show up in practice are not full-rank. T - [VOL I · 14 · Ensemble Methods](https://ai-encyclopedia.com/ml/14-ensembles.html): A single tree overfits, a single shallow model underfits, and any single model is a single point of failure. Combine many weak models the ri - [VOL I · Gradient Boosting in Practice](https://ai-encyclopedia.com/ml/15-boosting-libraries.html): Open any tabular-data leaderboard and the top is usually the same three names. All three implement one idea, gradient boosting, and differ m - [MLOPS · Resampling & Cross-Validation](https://ai-encyclopedia.com/mlops/01-resampling-cv.html): Holding out one slice of the data and scoring on it gives a number, but that number is itself a random draw. A different split would have pr - [MLOPS · Hyperparameter Tuning](https://ai-encyclopedia.com/mlops/02-hyperparameter-tuning.html): Training fits the model parameters. The hyperparameters, such as learning rate, tree depth, and regularization strength, are set by a search - [MLOPS · Metrics](https://ai-encyclopedia.com/mlops/03-regression-classification-metrics.html): A metric is not just a final report; it is the objective the pipeline optimizes toward, and it determines which errors the model is willing - [MLOPS · Ranking, Calibration, ROC, KS & PSI](https://ai-encyclopedia.com/mlops/04-ranking-calibration.html): A scoring model makes two separate promises, and most teams check only the first. One is correct ordering, placing risky cases above safe on - [MLOPS · Stability & Drift](https://ai-encyclopedia.com/mlops/05-stability-drift.html): A model is trained once on a fixed snapshot, then deployed into an environment that keeps changing. As the input distribution and the input- - [MLOPS · Explainability](https://ai-encyclopedia.com/mlops/06-explainability.html): A model that predicts well is not the same as a model you can account for. When a loan is denied, a tumour flagged, or a transaction blocked - [MLOPS · MLOps & Model Governance](https://ai-encyclopedia.com/mlops/07-mlops-governance.html): Training a model is the easy part. Keeping it trustworthy after the notebook closes requires a reproducible pipeline, a registry that record - [DL · Deep Learning Foundations](https://ai-encyclopedia.com/dl/01-foundations.html): A network with enough layers can in principle represent almost any function, yet for years deep stacks could not be trained. Activations and - [DL · Convolutional Neural Networks](https://ai-encyclopedia.com/dl/02-cnn.html): A photograph carries structure that a dense layer discards: nearby pixels belong together, and an object keeps its identity wherever it sits - [DL · Sequence Models](https://ai-encyclopedia.com/dl/03-sequence-models.html): A feed-forward network takes one fixed-size input and retains nothing between examples. A recurrent network reads a sequence one step at a t - [DL · Seq2Seq & the Birth of Attention](https://ai-encyclopedia.com/dl/04-seq2seq-attention.html): An encoder reads a sentence and a decoder writes its translation. The 2014 design made both recurrent networks and passed a single state vec - [DL · Autoencoders & VAEs](https://ai-encyclopedia.com/dl/05-autoencoders.html): Force a network to reconstruct its input through a narrow bottleneck and it learns the data's hidden coordinates, the few axes along which t - [DL · Generative Adversarial Networks](https://ai-encyclopedia.com/dl/06-gans.html): Most generative models estimate how likely the data is and climb that gradient. GANs discard the likelihood entirely. Adversarial training p - [DL · Training Deep Networks in Practice](https://ai-encyclopedia.com/dl/07-training-deep-nets.html): A network's architecture decides what it can represent; training decides whether it gets there. The optimizer, the learning-rate schedule, a - [RL · The Reinforcement Learning Problem](https://ai-encyclopedia.com/rl/01-the-rl-problem.html): Supervised learning hands the model a fixed set of labeled examples to imitate. Reinforcement learning gives an agent only a scalar reward a - [RL · Dynamic Programming](https://ai-encyclopedia.com/rl/02-dynamic-programming.html): The previous chapter posed the control problem and the value functions that summarize it. This chapter solves it exactly, under one strong a - [RL · Model-Free Value Methods](https://ai-encyclopedia.com/rl/03-model-free-value.html): Dynamic programming could solve any MDP, provided you handed it the transition probabilities and rewards. Real agents are not handed the mod - [RL · Policy Gradients & Actor-Critic](https://ai-encyclopedia.com/rl/04-policy-gradients.html): Every method so far has been indirect: estimate how good each action is, then act greedily with respect to those estimates. Policy-gradient - [RL · Deep Reinforcement Learning](https://ai-encyclopedia.com/rl/05-deep-rl.html): The tabular methods of the earlier chapters store one number per state, which is impractical the instant the state is a screen of pixels or - [RL · RL Meets LLMs](https://ai-encyclopedia.com/rl/06-rl-and-llms.html): For most of this volume the agent acted in a maze, a game, or a control loop. Now the environment is a conversation and the agent is a langu - [GAME · Games & Equilibria](https://ai-encyclopedia.com/game-theory/01-games-equilibria.html): In single-agent optimization, "optimal" means picking the action with the highest payoff. Once a player's reward depends on what other ratio - [GAME · Repeated & Cooperative Games](https://ai-encyclopedia.com/game-theory/02-repeated-cooperative.html): In a single encounter, rational self-interest can drive two players to an outcome both of them reject, as the Prisoner's Dilemma demonstrate - [GAME · Games in AI](https://ai-encyclopedia.com/game-theory/03-games-in-ai.html): Supervised learning is bounded by its teacher: a model can only chase the labels a human already wrote. Framing learning as a game lets the - [TIME · Time Series Fundamentals](https://ai-encyclopedia.com/timeseries/01-fundamentals.html): Most models assume the rows are interchangeable, so shuffling them loses nothing. Attach a clock and that assumption fails: yesterday shapes - [TIME · AR, MA, ARIMA & SARIMA](https://ai-encyclopedia.com/timeseries/02-arima.html): Before neural networks reached forecasting, Box and Jenkins reduced it to a procedure that could be taught. The recipe has three steps: diff - [TIME · Exponential Smoothing & Holt-Winters](https://ai-encyclopedia.com/timeseries/03-exponential-smoothing.html): Where ARIMA works through correlations of past errors, exponential smoothing makes a simpler assumption and performs well on it. It weights - [TIME · Volatility Modeling](https://ai-encyclopedia.com/timeseries/04-volatility-garch.html): Returns are close to unforecastable, but their size is not. Volatility clusters: calm periods follow calm periods and large moves follow lar - [TIME · Multivariate Time Series](https://ai-encyclopedia.com/timeseries/05-multivariate.html): When several series move together, a Vector Autoregression captures their feedback: every variable is regressed on the recent past of all th - [TIME · Forecasting in Practice](https://ai-encyclopedia.com/timeseries/06-forecasting-practice.html): Every earlier chapter showed you how to fit a model to a time series. This one covers how to judge whether it is any good, evaluated on data - [QUANT · Stochastic Processes](https://ai-encyclopedia.com/quant/01-stochastic-processes.html): In continuous time, prices follow a path that is jagged at every scale, so the ordinary calculus of smooth curves no longer applies. Itô's l - [QUANT · Binomial Option Pricing](https://ai-encyclopedia.com/quant/02-binomial-pricing.html): An option appears to depend on whether the stock rises or falls, yet its price does not. You can price an option without knowing the stock's - [QUANT · Black–Scholes & the Greeks](https://ai-encyclopedia.com/quant/03-black-scholes.html): Black, Scholes and Merton established a single no-arbitrage argument with one striking consequence. A stock's volatility, not its expected r - [QUANT · Interest-Rate Models](https://ai-encyclopedia.com/quant/04-interest-rate-models.html): In Black–Scholes the rate \(r\) was a constant you looked up; here it becomes the quantity being modelled. The field rests on one stylized f - [QUANT · Monte Carlo Methods in Finance](https://ai-encyclopedia.com/quant/05-monte-carlo.html): A price is an expectation, and an expectation can be estimated by averaging samples. When no closed form exists, Monte Carlo simulates many - [QUANT · Risk Measurement](https://ai-encyclopedia.com/quant/06-risk-measurement.html): Every portfolio has a distribution of next-day outcomes, and market-risk management summarizes the loss tail of that distribution into figur - [VOL II · 01 · Foundations](https://ai-encyclopedia.com/chapters/01-foundations.html): Underneath the scale and engineering, a large language model computes one function: given a sequence of tokens, it returns a probability dis - [VOL II · 02 · The Transformer](https://ai-encyclopedia.com/chapters/02-transformer.html): Every frontier model is a decoder-only transformer: a stack of identical blocks that read from and write to a shared workspace called the re - [VOL II · 03 · Attention](https://ai-encyclopedia.com/chapters/03-attention.html): Attention performs a differentiable soft lookup. Every position publishes what it holds (keys, values) and what it wants (queries), and info - [VOL II · 04 · Pre-training](https://ai-encyclopedia.com/chapters/04-pretraining.html): Pre-training spends a compute budget, months of time on tens of thousands of accelerators, to push cross-entropy as low as physics and econo - [VOL II · 05 · Post-training](https://ai-encyclopedia.com/chapters/05-posttraining.html): A base model knows things, but it does not yet behave. Post-training is the comparatively small but decisive stage that converts a next-toke - [VOL II · 06 · Fine-tuning](https://ai-encyclopedia.com/chapters/06-finetuning.html): Adapting a pre-trained model to your task is mostly a question of which parameters you allow to move. This chapter covers the spectrum from - [VOL II · 07 · Compression](https://ai-encyclopedia.com/chapters/07-compression.html): Generating a token requires streaming every weight through the chip. At decode time LLMs are memory-bandwidth-bound, so the bits each weight - [VOL II · 08 · Inference & Deployment](https://ai-encyclopedia.com/chapters/08-inference.html): Serving has its own physics: a compute-bound prefill followed by a memory-bound decode, with cost riding on cache management, batching polic - [VOL II · 09 · The Frontier](https://ai-encyclopedia.com/chapters/09-frontier.html): The dense decoder of Chapters 02 and 03 is now the baseline rather than the frontier. Production flagships route tokens through expert subne - [VOL II · 10 · Diffusion](https://ai-encyclopedia.com/chapters/10-diffusion.html): The second major family of generative models works differently from next-token prediction. The procedure is to destroy data with noise, then - [VOL II · The 2026 Frontier](https://ai-encyclopedia.com/chapters/11-frontier-2026.html): For eight years the Transformer had no serious rival. That changed. State-space models now match attention's quality at linear cost, and pos - [VOL II · Capstone · The Full Stack](https://ai-encyclopedia.com/chapters/capstone.html): Ten chapters compress into two instruments. First, design a frontier model, where every slider invokes an equation you have already met, fro - [VOL III · 01 · How Models Read Prompts](https://ai-encyclopedia.com/prompting/01-how-prompts-work.html): A prompt functions as the condition in a conditional probability: every token you write reshapes the distribution over the tokens the model - [VOL III · 02 · The Scaffold: Role · Task · Context · Format · Constraints](https://ai-encyclopedia.com/prompting/02-the-scaffold.html): A weak prompt is usually short on information, not cleverness. The scaffold is a five-part checklist that puts the conditioning the model ca - [VOL III · 03 · Show, Don't Tell: Few-Shot & Examples](https://ai-encyclopedia.com/prompting/03-few-shot.html): Instructions describe a task; examples demonstrate it, and the model was trained on demonstration rather than description. Across nearly eve - [VOL III · 04 · Reasoning Controls: CoT to Effort Dials](https://ai-encyclopedia.com/prompting/04-reasoning.html): For two years, the phrase “let's think step by step” was among the highest-leverage edits in applied AI; models trained with reinforcement l - [VOL III · 05 · Structured Output & Tool-Ready Prompts](https://ai-encyclopedia.com/prompting/05-structured-output.html): Once a model's output feeds a program instead of a person, formatting stops being style and becomes a contract. This chapter works up a ladd - [VOL III · 06 · Self-Critique, Red Teams & Councils](https://ai-encyclopedia.com/prompting/06-adversarial.html): Everything a model emits in one pass is a draft: fluent, confident, and unexamined. The techniques here exploit one asymmetry. Models are me - [VOL III · 07 · Evaluation & The Prompt Lab](https://ai-encyclopedia.com/prompting/07-evaluation-lab.html): Six chapters of this volume have made claims: scaffolds beat bare asks, examples beat adjectives, critique loops catch what single passes mi - [VOL III · ⌘ · The Pattern Library](https://ai-encyclopedia.com/prompting/patterns.html): The seven chapters behind you are organized by technique. This page reorganizes the same material by the work itself, since at the keyboard - [VOL IV · 01 · From Chat to Agents: The Loop](https://ai-encyclopedia.com/agents/01-the-agentic-loop.html): A chatbot emits an answer and the episode ends. An agent emits an action, observes the result, and runs again. That difference is about ten - [VOL IV · 02 · Context Engineering](https://ai-encyclopedia.com/agents/02-context-engineering.html): Prompt engineering optimized one string for one call. Across a fifty-step loop, the question shifts: not how to phrase a request, but what s - [VOL IV · 03 · Tool Design & MCP](https://ai-encyclopedia.com/agents/03-tools-and-mcp.html): A tool is the API between a model and the world, and the model never sees your code, only its surface. The name, the description, and the pa - [VOL IV · 04 · Harness Engineering](https://ai-encyclopedia.com/agents/04-harness-engineering.html): A capable model wired straight into a shell is not a product. It is an incident waiting to happen. The harness, meaning the sandbox, permiss - [VOL IV · 05 · Loop Engineering & Multi-Agent Patterns](https://ai-encyclopedia.com/agents/05-loop-engineering.html): An agent is a loop, and a loop multiplies probabilities: fifty steps at 99% each is roughly a coin flip. Models improve on someone else's sc - [VOL IV · 06 · Evals, Observability & Cost](https://ai-encyclopedia.com/agents/06-evals-observability.html): The first five chapters of this volume covered how to build agents. This one covers how to know whether they work, why they fail, and what t - [FRAME · PyTorch](https://ai-encyclopedia.com/frameworks/01-pytorch.html): PyTorch extends the NumPy array model with two additions that account for most of its use in deep learning: GPU execution and automatic diff - [FRAME · TensorFlow & Keras](https://ai-encyclopedia.com/frameworks/02-tensorflow-keras.html): Keras reduces a working neural network to a few readable lines: stack layers, call compile, call fit. That brevity is the value and the haza - [FRAME · The Ecosystem & Deployment](https://ai-encyclopedia.com/frameworks/03-ecosystem-deployment.html): Training and deployment operate under different constraints. A model confined to a notebook has not shipped; ONNX, TorchScript, and the serv - [MM · Computer Vision with Deep Nets](https://ai-encyclopedia.com/multimodal/01-vision.html): Convolutional networks learned to read pixels, ImageNet made accuracy a shared benchmark, and recognition became the first task deep learnin - [MM · Multimodal LLMs](https://ai-encyclopedia.com/multimodal/02-multimodal-llms.html): A transformer treats its tokens as vectors to attend over, regardless of what they encode. An image becomes attendable once it is sliced int - [MM · Image & Video Generation](https://ai-encyclopedia.com/multimodal/03-image-generation.html): Text-to-image generation cycled through adversarial, autoregressive, and energy-based models before diffusion displaced them. One denoising - [MM · Speech & Audio Models](https://ai-encyclopedia.com/multimodal/04-speech-audio.html): Speech systems were once pipelines of hand-engineered parts: acoustic models, pronunciation lexicons, language models, and vocoders. Speech - [MM · World Models](https://ai-encyclopedia.com/multimodal/05-world-models.html): A language model predicts the next token; an agent acting in the world needs to predict the next state. World models learn the latent dynami - [MM · Embodied AI & Robotics](https://ai-encyclopedia.com/multimodal/06-embodied.html): Every modality so far in this volume describes the world: text, images, audio, video. Action is the modality that acts on it, and vision-lan - [OPEN · Open vs Closed Weights](https://ai-encyclopedia.com/openmodels/01-open-vs-closed.html): Whether you can hold the weights sits upstream of almost every other decision about a language model. If the parameters live on your disk ra - [OPEN · Running Open Models](https://ai-encyclopedia.com/openmodels/02-running-open-models.html): A frontier-class model can run on your laptop once you understand quantization, the serving engine, and the memory math. Open weights are a - [OPEN · Fine-Tuning Open Models](https://ai-encyclopedia.com/openmodels/03-finetuning-open.html): Closed APIs rent you a fixed behavior; open weights let you change it. Owning the weights means you can teach the model your domain, provide - [OPEN · Training Techniques in Practice](https://ai-encyclopedia.com/openmodels/04-training-techniques.html): Anyone can run a fine-tune. Making it learn without forgetting is harder. The difference between a model that learns your domain and one tha - [OPEN · Red-Teaming, Jailbreaks & Safety](https://ai-encyclopedia.com/openmodels/05-red-teaming.html): Deploying a model safely requires knowing how it fails. Red-teaming is the practice of breaking your own model before someone else does. Thi ## Practice - [The Gym](https://ai-encyclopedia.com/gym/index.html): drills, numeric problems and Python katas, graded client-side. ## Full content - [llms-full.txt](https://ai-encyclopedia.com/llms-full.txt): the complete text of every chapter in one file — load it as context.