# The AI Encyclopedia

> A free, open, interactive encyclopedia of AI: machine learning from first
> principles, the complete LLM field manual, prompting, and agent engineering —
> with runnable Python and 70+ live instruments. https://ai-encyclopedia.com

## Volumes

- [STATS · Probability](https://ai-encyclopedia.com/stats/01-probability.html): Probability gives degrees of belief an arithmetic, built on three axioms. Conditioning is the operation that turns prior belief into posteri
- [STATS · Distributions](https://ai-encyclopedia.com/stats/02-distributions.html): A handful of named distributions account for most randomness in practice: coin flips, queue arrivals, measurement noise, market returns. Eac
- [STATS · Correlation &amp; Causation](https://ai-encyclopedia.com/stats/03-descriptive-correlation.html): Correlation measures how two variables move together. Moving from correlation to causation requires a causal model, not more data. This chap
- [STATS · Statistical Inference &amp; Hypothesis Testing](https://ai-encyclopedia.com/stats/04-inference-testing.html): You never observe the population, only a sample. Inference draws conclusions about the whole from the part, and it attaches a measure of its
- [STATS · Bayesian Inference](https://ai-encyclopedia.com/stats/05-bayesian.html): Frequentist statistics treats a parameter as a fixed unknown and the data as random; Bayesian inference reverses this. A parameter becomes a
- [STATS · Linear Algebra for Machine Learning](https://ai-encyclopedia.com/stats/06-linear-algebra.html): Beneath the framework and the GPU, almost every model in this encyclopedia is the same object: a stack of linear maps with some nonlinearity
- [STATS · Markov Chains &amp; MCMC](https://ai-encyclopedia.com/stats/07-markov-chains.html): A process that forgets its past, a property called memorylessness, is enough to model PageRank and language and to sample from distributions
- [STATS · Information Theory](https://ai-encyclopedia.com/stats/08-information-theory.html): In 1948 Claude Shannon laid a foundation that still governs machine learning. He measured surprise as a number, entropy, and proved it is th
- [DATA · The Data Problem](https://ai-encyclopedia.com/data/01-the-data-problem.html): Most experiments are decided before a single gradient is computed. A model is only as trustworthy as its data split, and the most expensive 
- [DATA · Missing Data &amp; Imputation](https://ai-encyclopedia.com/data/02-missing-data.html): Real datasets arrive with holes, and the holes are rarely random. How a value went missing constrains how you may fill it, and naive mean-im
- [DATA · Encoding, Scaling &amp; Transforms](https://ai-encyclopedia.com/data/03-encoding-scaling.html): Models consume numbers, so the encoding of categories and the scaling of features often matters more than the choice of model. A linear mode
- [DATA · Feature Engineering &amp; Selection](https://ai-encyclopedia.com/data/04-feature-engineering.html): The right feature can let a linear model beat a neural net. Feature engineering is the point where domain knowledge enters the math. This ch
- [DATA · Imbalanced Data](https://ai-encyclopedia.com/data/05-imbalanced.html): When 1 case in 1000 is the one that matters, as with fraud, disease, or default, accuracy stops being informative. Accuracy misleads under i
- [VOL I · 01 · Learning from Data](https://ai-encyclopedia.com/ml/01-learning-from-data.html): Regression lines, neural networks, and trillion-parameter language models all run on one idea. Instead of writing the rules yourself, you wr
- [VOL I · 02 · Linear Regression &amp; Gradient Descent](https://ai-encyclopedia.com/ml/02-linear-regression.html): One model, a weighted sum. One loss, squared error. One algorithm, step downhill. Linear regression is the smallest setting in which the ful
- [VOL I · 03 · Classification: Logistic &amp; Softmax](https://ai-encyclopedia.com/ml/03-classification.html): Chapter 02 predicted numbers. Most tasks instead ask for a choice: spam or not, benign or malignant, which of 100,000 tokens comes next. The
- [VOL I · 04 · Trees, Forests &amp; Neighbors](https://ai-encyclopedia.com/ml/04-trees-and-neighbors.html): Not every model is a curve bent by gradient descent. This chapter covers methods that keep the training data instead of compressing it away:
- [VOL I · 05 · Clustering &amp; Dimensionality](https://ai-encyclopedia.com/ml/05-unsupervised.html): Every method so far was handed the right answers. This chapter takes them away. With no labels, the model must find structure the data carri
- [VOL I · 06 · Generalization: Bias, Variance &amp; Regularization](https://ai-encyclopedia.com/ml/06-generalization.html): Any model with enough knobs can score perfectly on data it has already seen. That is memorization, and it is worth nothing. The only error t
- [VOL I · 07 · Neural Networks: The MLP](https://ai-encyclopedia.com/ml/07-neural-networks.html): A linear model can draw exactly one flat boundary, and four points are enough to defeat it. The fix is small. Stack two linear maps with a n
- [VOL I · 08 · Backpropagation &amp; Optimization](https://ai-encyclopedia.com/ml/08-backpropagation.html): Chapter 07 left a network with thirty-three knobs and one number telling it how wrong it is. This chapter is the algorithm that turns that o
- [VOL I · 09 · Naive Bayes &amp; Generative Classifiers](https://ai-encyclopedia.com/ml/09-naive-bayes.html): Most classifiers learn where to draw the line between classes. A generative classifier instead learns to model each class, then asks which c
- [VOL I · 10 · Support Vector Machines &amp; the Kernel Trick](https://ai-encyclopedia.com/ml/10-svm-kernels.html): Of all the lines that separate two classes, the SVM picks the one sitting in the widest empty corridor. Maximize that margin and a handful o
- [VOL I · Distance &amp; Similarity Metrics](https://ai-encyclopedia.com/ml/11-distances-similarity.html): k-NN, every clustering algorithm, and every vector search rest on one decision that usually goes unexamined: how you measure "close". That s
- [VOL I · The Clustering Zoo](https://ai-encyclopedia.com/ml/12-clustering-zoo.html): k-means is fast and simple, but its squared-distance-to-a-centre objective can only carve the plane into round, equal, convex blobs. Hand it
- [VOL I · 13 · Matrix Factorization &amp; SVD](https://ai-encyclopedia.com/ml/13-matrix-factorization.html): A ratings table, a term-document count, a pixel grid, an adjacency matrix: most large matrices that show up in practice are not full-rank. T
- [VOL I · 14 · Ensemble Methods](https://ai-encyclopedia.com/ml/14-ensembles.html): A single tree overfits, a single shallow model underfits, and any single model is a single point of failure. Combine many weak models the ri
- [VOL I · Gradient Boosting in Practice](https://ai-encyclopedia.com/ml/15-boosting-libraries.html): Open any tabular-data leaderboard and the top is usually the same three names. All three implement one idea, gradient boosting, and differ m
- [MLOPS · Resampling &amp; Cross-Validation](https://ai-encyclopedia.com/mlops/01-resampling-cv.html): Holding out one slice of the data and scoring on it gives a number, but that number is itself a random draw. A different split would have pr
- [MLOPS · Hyperparameter Tuning](https://ai-encyclopedia.com/mlops/02-hyperparameter-tuning.html): Training fits the model parameters. The hyperparameters, such as learning rate, tree depth, and regularization strength, are set by a search
- [MLOPS · Metrics](https://ai-encyclopedia.com/mlops/03-regression-classification-metrics.html): A metric is not just a final report; it is the objective the pipeline optimizes toward, and it determines which errors the model is willing 
- [MLOPS · Ranking, Calibration, ROC, KS &amp; PSI](https://ai-encyclopedia.com/mlops/04-ranking-calibration.html): A scoring model makes two separate promises, and most teams check only the first. One is correct ordering, placing risky cases above safe on
- [MLOPS · Stability &amp; Drift](https://ai-encyclopedia.com/mlops/05-stability-drift.html): A model is trained once on a fixed snapshot, then deployed into an environment that keeps changing. As the input distribution and the input-
- [MLOPS · Explainability](https://ai-encyclopedia.com/mlops/06-explainability.html): A model that predicts well is not the same as a model you can account for. When a loan is denied, a tumour flagged, or a transaction blocked
- [MLOPS · MLOps &amp; Model Governance](https://ai-encyclopedia.com/mlops/07-mlops-governance.html): Training a model is the easy part. Keeping it trustworthy after the notebook closes requires a reproducible pipeline, a registry that record
- [DL · Deep Learning Foundations](https://ai-encyclopedia.com/dl/01-foundations.html): A network with enough layers can in principle represent almost any function, yet for years deep stacks could not be trained. Activations and
- [DL · Convolutional Neural Networks](https://ai-encyclopedia.com/dl/02-cnn.html): A photograph carries structure that a dense layer discards: nearby pixels belong together, and an object keeps its identity wherever it sits
- [DL · Sequence Models](https://ai-encyclopedia.com/dl/03-sequence-models.html): A feed-forward network takes one fixed-size input and retains nothing between examples. A recurrent network reads a sequence one step at a t
- [DL · Seq2Seq &amp; the Birth of Attention](https://ai-encyclopedia.com/dl/04-seq2seq-attention.html): An encoder reads a sentence and a decoder writes its translation. The 2014 design made both recurrent networks and passed a single state vec
- [DL · Autoencoders &amp; VAEs](https://ai-encyclopedia.com/dl/05-autoencoders.html): Force a network to reconstruct its input through a narrow bottleneck and it learns the data's hidden coordinates, the few axes along which t
- [DL · Generative Adversarial Networks](https://ai-encyclopedia.com/dl/06-gans.html): Most generative models estimate how likely the data is and climb that gradient. GANs discard the likelihood entirely. Adversarial training p
- [DL · Training Deep Networks in Practice](https://ai-encyclopedia.com/dl/07-training-deep-nets.html): A network's architecture decides what it can represent; training decides whether it gets there. The optimizer, the learning-rate schedule, a
- [RL · The Reinforcement Learning Problem](https://ai-encyclopedia.com/rl/01-the-rl-problem.html): Supervised learning hands the model a fixed set of labeled examples to imitate. Reinforcement learning gives an agent only a scalar reward a
- [RL · Dynamic Programming](https://ai-encyclopedia.com/rl/02-dynamic-programming.html): The previous chapter posed the control problem and the value functions that summarize it. This chapter solves it exactly, under one strong a
- [RL · Model-Free Value Methods](https://ai-encyclopedia.com/rl/03-model-free-value.html): Dynamic programming could solve any MDP, provided you handed it the transition probabilities and rewards. Real agents are not handed the mod
- [RL · Policy Gradients &amp; Actor-Critic](https://ai-encyclopedia.com/rl/04-policy-gradients.html): Every method so far has been indirect: estimate how good each action is, then act greedily with respect to those estimates. Policy-gradient 
- [RL · Deep Reinforcement Learning](https://ai-encyclopedia.com/rl/05-deep-rl.html): The tabular methods of the earlier chapters store one number per state, which is impractical the instant the state is a screen of pixels or 
- [RL · RL Meets LLMs](https://ai-encyclopedia.com/rl/06-rl-and-llms.html): For most of this volume the agent acted in a maze, a game, or a control loop. Now the environment is a conversation and the agent is a langu
- [GAME · Games &amp; Equilibria](https://ai-encyclopedia.com/game-theory/01-games-equilibria.html): In single-agent optimization, "optimal" means picking the action with the highest payoff. Once a player's reward depends on what other ratio
- [GAME · Repeated &amp; Cooperative Games](https://ai-encyclopedia.com/game-theory/02-repeated-cooperative.html): In a single encounter, rational self-interest can drive two players to an outcome both of them reject, as the Prisoner's Dilemma demonstrate
- [GAME · Games in AI](https://ai-encyclopedia.com/game-theory/03-games-in-ai.html): Supervised learning is bounded by its teacher: a model can only chase the labels a human already wrote. Framing learning as a game lets the 
- [TIME · Time Series Fundamentals](https://ai-encyclopedia.com/timeseries/01-fundamentals.html): Most models assume the rows are interchangeable, so shuffling them loses nothing. Attach a clock and that assumption fails: yesterday shapes
- [TIME · AR, MA, ARIMA &amp; SARIMA](https://ai-encyclopedia.com/timeseries/02-arima.html): Before neural networks reached forecasting, Box and Jenkins reduced it to a procedure that could be taught. The recipe has three steps: diff
- [TIME · Exponential Smoothing &amp; Holt-Winters](https://ai-encyclopedia.com/timeseries/03-exponential-smoothing.html): Where ARIMA works through correlations of past errors, exponential smoothing makes a simpler assumption and performs well on it. It weights 
- [TIME · Volatility Modeling](https://ai-encyclopedia.com/timeseries/04-volatility-garch.html): Returns are close to unforecastable, but their size is not. Volatility clusters: calm periods follow calm periods and large moves follow lar
- [TIME · Multivariate Time Series](https://ai-encyclopedia.com/timeseries/05-multivariate.html): When several series move together, a Vector Autoregression captures their feedback: every variable is regressed on the recent past of all th
- [TIME · Forecasting in Practice](https://ai-encyclopedia.com/timeseries/06-forecasting-practice.html): Every earlier chapter showed you how to fit a model to a time series. This one covers how to judge whether it is any good, evaluated on data
- [QUANT · Stochastic Processes](https://ai-encyclopedia.com/quant/01-stochastic-processes.html): In continuous time, prices follow a path that is jagged at every scale, so the ordinary calculus of smooth curves no longer applies. Itô's l
- [QUANT · Binomial Option Pricing](https://ai-encyclopedia.com/quant/02-binomial-pricing.html): An option appears to depend on whether the stock rises or falls, yet its price does not. You can price an option without knowing the stock's
- [QUANT · Black–Scholes &amp; the Greeks](https://ai-encyclopedia.com/quant/03-black-scholes.html): Black, Scholes and Merton established a single no-arbitrage argument with one striking consequence. A stock's volatility, not its expected r
- [QUANT · Interest-Rate Models](https://ai-encyclopedia.com/quant/04-interest-rate-models.html): In Black–Scholes the rate \(r\) was a constant you looked up; here it becomes the quantity being modelled. The field rests on one stylized f
- [QUANT · Monte Carlo Methods in Finance](https://ai-encyclopedia.com/quant/05-monte-carlo.html): A price is an expectation, and an expectation can be estimated by averaging samples. When no closed form exists, Monte Carlo simulates many 
- [QUANT · Risk Measurement](https://ai-encyclopedia.com/quant/06-risk-measurement.html): Every portfolio has a distribution of next-day outcomes, and market-risk management summarizes the loss tail of that distribution into figur
- [VOL II · 01 · Foundations](https://ai-encyclopedia.com/chapters/01-foundations.html): Underneath the scale and engineering, a large language model computes one function: given a sequence of tokens, it returns a probability dis
- [VOL II · 02 · The Transformer](https://ai-encyclopedia.com/chapters/02-transformer.html): Every frontier model is a decoder-only transformer: a stack of identical blocks that read from and write to a shared workspace called the re
- [VOL II · 03 · Attention](https://ai-encyclopedia.com/chapters/03-attention.html): Attention performs a differentiable soft lookup. Every position publishes what it holds (keys, values) and what it wants (queries), and info
- [VOL II · 04 · Pre-training](https://ai-encyclopedia.com/chapters/04-pretraining.html): Pre-training spends a compute budget, months of time on tens of thousands of accelerators, to push cross-entropy as low as physics and econo
- [VOL II · 05 · Post-training](https://ai-encyclopedia.com/chapters/05-posttraining.html): A base model knows things, but it does not yet behave. Post-training is the comparatively small but decisive stage that converts a next-toke
- [VOL II · 06 · Fine-tuning](https://ai-encyclopedia.com/chapters/06-finetuning.html): Adapting a pre-trained model to your task is mostly a question of which parameters you allow to move. This chapter covers the spectrum from 
- [VOL II · 07 · Compression](https://ai-encyclopedia.com/chapters/07-compression.html): Generating a token requires streaming every weight through the chip. At decode time LLMs are memory-bandwidth-bound, so the bits each weight
- [VOL II · 08 · Inference &amp; Deployment](https://ai-encyclopedia.com/chapters/08-inference.html): Serving has its own physics: a compute-bound prefill followed by a memory-bound decode, with cost riding on cache management, batching polic
- [VOL II · 09 · The Frontier](https://ai-encyclopedia.com/chapters/09-frontier.html): The dense decoder of Chapters 02 and 03 is now the baseline rather than the frontier. Production flagships route tokens through expert subne
- [VOL II · 10 · Diffusion](https://ai-encyclopedia.com/chapters/10-diffusion.html): The second major family of generative models works differently from next-token prediction. The procedure is to destroy data with noise, then
- [VOL II · The 2026 Frontier](https://ai-encyclopedia.com/chapters/11-frontier-2026.html): For eight years the Transformer had no serious rival. That changed. State-space models now match attention's quality at linear cost, and pos
- [VOL II · Capstone · The Full Stack](https://ai-encyclopedia.com/chapters/capstone.html): Ten chapters compress into two instruments. First, design a frontier model, where every slider invokes an equation you have already met, fro
- [VOL III · 01 · How Models Read Prompts](https://ai-encyclopedia.com/prompting/01-how-prompts-work.html): A prompt functions as the condition in a conditional probability: every token you write reshapes the distribution over the tokens the model 
- [VOL III · 02 · The Scaffold: Role · Task · Context · Format · Constraints](https://ai-encyclopedia.com/prompting/02-the-scaffold.html): A weak prompt is usually short on information, not cleverness. The scaffold is a five-part checklist that puts the conditioning the model ca
- [VOL III · 03 · Show, Don't Tell: Few-Shot &amp; Examples](https://ai-encyclopedia.com/prompting/03-few-shot.html): Instructions describe a task; examples demonstrate it, and the model was trained on demonstration rather than description. Across nearly eve
- [VOL III · 04 · Reasoning Controls: CoT to Effort Dials](https://ai-encyclopedia.com/prompting/04-reasoning.html): For two years, the phrase “let's think step by step” was among the highest-leverage edits in applied AI; models trained with reinforcement l
- [VOL III · 05 · Structured Output &amp; Tool-Ready Prompts](https://ai-encyclopedia.com/prompting/05-structured-output.html): Once a model's output feeds a program instead of a person, formatting stops being style and becomes a contract. This chapter works up a ladd
- [VOL III · 06 · Self-Critique, Red Teams &amp; Councils](https://ai-encyclopedia.com/prompting/06-adversarial.html): Everything a model emits in one pass is a draft: fluent, confident, and unexamined. The techniques here exploit one asymmetry. Models are me
- [VOL III · 07 · Evaluation &amp; The Prompt Lab](https://ai-encyclopedia.com/prompting/07-evaluation-lab.html): Six chapters of this volume have made claims: scaffolds beat bare asks, examples beat adjectives, critique loops catch what single passes mi
- [VOL III · ⌘ · The Pattern Library](https://ai-encyclopedia.com/prompting/patterns.html): The seven chapters behind you are organized by technique. This page reorganizes the same material by the work itself, since at the keyboard 
- [VOL IV · 01 · From Chat to Agents: The Loop](https://ai-encyclopedia.com/agents/01-the-agentic-loop.html): A chatbot emits an answer and the episode ends. An agent emits an action, observes the result, and runs again. That difference is about ten 
- [VOL IV · 02 · Context Engineering](https://ai-encyclopedia.com/agents/02-context-engineering.html): Prompt engineering optimized one string for one call. Across a fifty-step loop, the question shifts: not how to phrase a request, but what s
- [VOL IV · 03 · Tool Design &amp; MCP](https://ai-encyclopedia.com/agents/03-tools-and-mcp.html): A tool is the API between a model and the world, and the model never sees your code, only its surface. The name, the description, and the pa
- [VOL IV · 04 · Harness Engineering](https://ai-encyclopedia.com/agents/04-harness-engineering.html): A capable model wired straight into a shell is not a product. It is an incident waiting to happen. The harness, meaning the sandbox, permiss
- [VOL IV · 05 · Loop Engineering &amp; Multi-Agent Patterns](https://ai-encyclopedia.com/agents/05-loop-engineering.html): An agent is a loop, and a loop multiplies probabilities: fifty steps at 99% each is roughly a coin flip. Models improve on someone else's sc
- [VOL IV · 06 · Evals, Observability &amp; Cost](https://ai-encyclopedia.com/agents/06-evals-observability.html): The first five chapters of this volume covered how to build agents. This one covers how to know whether they work, why they fail, and what t
- [FRAME · PyTorch](https://ai-encyclopedia.com/frameworks/01-pytorch.html): PyTorch extends the NumPy array model with two additions that account for most of its use in deep learning: GPU execution and automatic diff
- [FRAME · TensorFlow &amp; Keras](https://ai-encyclopedia.com/frameworks/02-tensorflow-keras.html): Keras reduces a working neural network to a few readable lines: stack layers, call compile, call fit. That brevity is the value and the haza
- [FRAME · The Ecosystem &amp; Deployment](https://ai-encyclopedia.com/frameworks/03-ecosystem-deployment.html): Training and deployment operate under different constraints. A model confined to a notebook has not shipped; ONNX, TorchScript, and the serv
- [MM · Computer Vision with Deep Nets](https://ai-encyclopedia.com/multimodal/01-vision.html): Convolutional networks learned to read pixels, ImageNet made accuracy a shared benchmark, and recognition became the first task deep learnin
- [MM · Multimodal LLMs](https://ai-encyclopedia.com/multimodal/02-multimodal-llms.html): A transformer treats its tokens as vectors to attend over, regardless of what they encode. An image becomes attendable once it is sliced int
- [MM · Image &amp; Video Generation](https://ai-encyclopedia.com/multimodal/03-image-generation.html): Text-to-image generation cycled through adversarial, autoregressive, and energy-based models before diffusion displaced them. One denoising 
- [MM · Speech &amp; Audio Models](https://ai-encyclopedia.com/multimodal/04-speech-audio.html): Speech systems were once pipelines of hand-engineered parts: acoustic models, pronunciation lexicons, language models, and vocoders. Speech 
- [MM · World Models](https://ai-encyclopedia.com/multimodal/05-world-models.html): A language model predicts the next token; an agent acting in the world needs to predict the next state. World models learn the latent dynami
- [MM · Embodied AI &amp; Robotics](https://ai-encyclopedia.com/multimodal/06-embodied.html): Every modality so far in this volume describes the world: text, images, audio, video. Action is the modality that acts on it, and vision-lan
- [OPEN · Open vs Closed Weights](https://ai-encyclopedia.com/openmodels/01-open-vs-closed.html): Whether you can hold the weights sits upstream of almost every other decision about a language model. If the parameters live on your disk ra
- [OPEN · Running Open Models](https://ai-encyclopedia.com/openmodels/02-running-open-models.html): A frontier-class model can run on your laptop once you understand quantization, the serving engine, and the memory math. Open weights are a 
- [OPEN · Fine-Tuning Open Models](https://ai-encyclopedia.com/openmodels/03-finetuning-open.html): Closed APIs rent you a fixed behavior; open weights let you change it. Owning the weights means you can teach the model your domain, provide
- [OPEN · Training Techniques in Practice](https://ai-encyclopedia.com/openmodels/04-training-techniques.html): Anyone can run a fine-tune. Making it learn without forgetting is harder. The difference between a model that learns your domain and one tha
- [OPEN · Red-Teaming, Jailbreaks &amp; Safety](https://ai-encyclopedia.com/openmodels/05-red-teaming.html): Deploying a model safely requires knowing how it fails. Red-teaming is the practice of breaking your own model before someone else does. Thi

## Practice

- [The Gym](https://ai-encyclopedia.com/gym/index.html): drills, numeric problems and Python katas, graded client-side.

## Full content

- [llms-full.txt](https://ai-encyclopedia.com/llms-full.txt): the complete text of every chapter in one file — load it as context.