Value at Risk: the definition
A portfolio's profit-and-loss over a horizon \(h\) (one day, ten days) is a random variable \(L\) — by convention, a loss, so positive \(L\) means money lost. Value at Risk asks one question: pick a confidence level \(\alpha\) (say 99%); what is the loss threshold the portfolio will not exceed with probability \(\alpha\)? It is, precisely, a quantile of the loss distribution.
Three conventions trip up newcomers, so fix them now. First, sign: VaR is reported as a positive number ("our 99% VaR is $4.2M"), even though it lives in the left tail of P&L. Second, confidence vs. tail: 99% VaR and "the 1% tail" are the same object — \(\alpha = 0.99\) means a tail probability \(p = 1 - \alpha = 0.01\). Third, horizon: a one-day VaR scales to \(h\) days under the i.i.d.-normal assumption by the square-root-of-time rule, \(\mathrm{VaR}_h \approx \sqrt{h}\,\mathrm{VaR}_1\) — an approximation that quietly fails when returns are autocorrelated or fat-tailed.
VaR's power is sociological as much as mathematical. Before RiskMetrics popularized it in 1994–96, a bank's market risk lived in a hundred incomparable desk reports. VaR gave one currency-denominated number that aggregates across asset classes, lets a CRO set a firm-wide limit, and lets a regulator demand capital against it. That it throws away the shape of the tail is the price of that universality — and the reason the field spent the next thirty years patching it.
For a quick desk estimate, assume losses are normal with mean \(\mu\) and standard deviation \(\sigma\). Then the quantile has a closed form, and almost every parametric VaR you will ever see is this one line:
Already the cracks show. The Gaussian VaR multiplier 2.326 assumes returns are normal; in reality equity returns have kurtosis well above 3, so the true 1% quantile sits further out than the formula admits. Two of the three methods in §6.2 exist precisely to escape this assumption.
Historical, parametric & Monte-Carlo VaR
EQ Q6.1 defines a quantile of the loss distribution — but you never have that distribution; you estimate it. The three industry methods are exactly three answers to "where does the loss distribution come from?"
| Method | Loss distribution from… | Strength | Weakness |
|---|---|---|---|
| Historical | the empirical sample of past returns | No distributional assumption; captures real fat tails & skew | Backward-looking; blind to risks absent from the window; ghost effects when crises age out |
| Parametric | a fitted model (usually Gaussian) — EQ Q6.2 | Fast, analytic, aggregates linearly via the covariance matrix | Normality under-states tails; useless for non-linear payoffs (options) |
| Monte-Carlo | simulated paths from a chosen model | Handles non-linearity, path dependence, any distribution you can sample | Slow; only as good as the model you simulate; sampling noise |
Historical VaR is the most honest and the most popular. Take \(N\) past daily P&L observations, sort them, and read off the empirical \((1-\alpha)\) quantile of the losses. No bell curve is assumed; if the last year held a 6-sigma day, it sits right there in the sample.
Parametric VaR is EQ Q6.2 with \(\sigma\) estimated from the same window (or from a volatility model — an EWMA or the GARCH machinery of Quant 01). For a linear multi-asset book it aggregates beautifully: portfolio variance is \(\mathbf{w}^\top \Sigma\, \mathbf{w}\), so one covariance matrix prices the VaR of the whole firm. The original RiskMetrics methodology was exactly this — Gaussian, EWMA-weighted covariance, square-root-of-time scaling.
Monte-Carlo VaR earns its cost when payoffs are non-linear. An options book's P&L is not a linear function of the underlying, so neither the empirical return sample nor a single \(\sigma\) captures it. Instead you simulate thousands of market scenarios (using the path machinery of Quant 05), fully reprice the book under each, and take the empirical quantile of the simulated P&L — the same order-statistic of EQ Q6.3, but on simulated rather than historical losses.
# Historical & parametric VaR + Expected Shortfall from a return series
import numpy as np
rng = np.random.default_rng(0)
# 1000 days of fat-tailed returns (Student-t, df=4): heavier tails than normal
rets = 0.01 * rng.standard_t(df=4, size=1000) # daily returns, ~1% scale
losses = -rets # convention: loss = -return
alpha = 0.99
# Historical VaR/ES: empirical quantile, then mean of the worse tail (EQ Q6.3, Q6.4)
var_h = np.quantile(losses, alpha)
es_h = losses[losses >= var_h].mean()
# Parametric Gaussian VaR/ES (EQ Q6.2): z=2.326, ES uses phi(z)/(1-alpha)
mu, sd = losses.mean(), losses.std(ddof=1)
z = 2.326 # Phi^{-1}(0.99)
var_p = mu + z * sd
phi = np.exp(-0.5 * z * z) / np.sqrt(2 * np.pi) # standard-normal pdf at z
es_p = mu + sd * phi / (1 - alpha) # Gaussian ES
print(f"sample sd (daily) : {sd*100:.3f} %")
print(f"historical 99% VaR : {var_h*100:.3f} % ES: {es_h*100:.3f} %")
print(f"parametric 99% VaR : {var_p*100:.3f} % ES: {es_p*100:.3f} %")
print(f"\nfat tails -> historical VaR exceeds the Gaussian one by "
f"{(var_h-var_p)/var_p*100:+.1f}% (normality under-states the tail)")
Expected Shortfall (CVaR)
VaR tells you the edge of the bad zone and nothing about its depth. Two portfolios can share an identical 99% VaR while one loses a little past it and the other is wiped out — VaR cannot tell them apart. Expected Shortfall (also Conditional VaR, CVaR, or expected tail loss) fixes this by averaging the losses that do breach VaR:
For the Gaussian case ES has a clean closed form, which makes the relationship to VaR exact and lets you feel the multiplier:
The deeper reason ES won is theoretical. Artzner, Delbaen, Eber and Heath (1999) laid down four axioms any sensible risk measure should obey — monotonicity, translation invariance, positive homogeneity, and subadditivity — and called a measure satisfying all four coherent. Subadditivity is the one that bites: \(\rho(A + B) \le \rho(A) + \rho(B)\), i.e. diversification cannot increase risk. ES is coherent. VaR is not — it can violate subadditivity, reporting a merged portfolio as riskier than the sum of its parts, which is not just ugly but actively penalizes hedging and diversification.
VaR can punish diversification. Take two independent corporate bonds, each defaulting with probability 4% (loss 100, else 0). The 95% VaR of each alone is 0 — the 4% default sits inside the 5% tail, so the 95th-percentile loss is zero. But a portfolio of both defaults-at-least-once with probability \(1 - 0.96^2 \approx 7.8\% > 5\%\), so its 95% VaR is 100. VaR of the pair (100) exceeds the sum of the parts (0 + 0): diversifying made the reported risk explode. ES suffers no such pathology — averaging the tail restores additivity. This single example, more than any other, is why the Basel framework migrated off VaR.
ES is not free of trouble. It is harder to backtest than VaR — you are estimating a conditional mean from a handful of tail observations, and a clean pass/fail test (the subject of §6.4) was elusive for years. VaR remains the easier number to validate, which is why both live side by side in the modern regime: ES sets the capital, VaR-style exceedance counting still polices the model.
Backtesting VaR — Kupiec & the traffic light
A VaR number is a falsifiable prediction: at 99% confidence, losses should exceed VaR on about 1% of days. So count the exceedances (days where realized loss > that day's VaR forecast) and ask whether the count is consistent with the model. Over \(T\) days at tail probability \(p = 1 - \alpha\), the number of exceptions \(x\) is, under a correct model, Binomial\((T, p)\) with expected value \(pT\).
Kupiec (1995) made "too many" precise with a likelihood-ratio test of the unconditional coverage — the proportion-of-failures (POF) test. It compares the model's claimed rate \(p\) against the observed rate \(x/T\):
Kupiec's test only counts exceptions; it ignores when they happened. Christoffersen (1998) added an independence test (exceptions should not cluster — a model that fails on five consecutive days is broken even if the count looks fine) and combined the two into a conditional-coverage test. The two ideas — right number, well-spaced — are the backbone of every modern VaR backtest.
Regulators encode a blunter, more operational version: the Basel traffic light. Count exceptions over the last 250 trading days at 99% and bucket the firm:
| Zone | Exceptions (250 days, 99%) | Cumulative P(≤x) under correct model | Consequence |
|---|---|---|---|
| Green | 0 – 4 | ≈ 89.2% | Model accepted; no capital penalty |
| Yellow | 5 – 9 | 89.2% – 99.99% | Multiplier raised (≈ +0.4 to +0.85); supervisory scrutiny |
| Red | 10+ | > 99.99% | Model presumed flawed; max multiplier, remediation demanded |
The boundaries are not round numbers; they come from the Binomial. At 99% over 250 days the expected count is 2.5, and the probability of seeing 10 or more exceptions if the model is correct is under 0.01% — so 10 exceptions is overwhelming evidence the model is wrong, not bad luck. The yellow zone is the honest middle: bad enough to worry, not damning enough to condemn. The asymmetry (no penalty for too few exceptions) is deliberate — the supervisor cares about under-stated risk, not over-caution.
# Backtest VaR: count exceedances vs expected, Kupiec POF + Basel zone
import numpy as np
rng = np.random.default_rng(1)
T, alpha = 250, 0.99
p = 1 - alpha # tail probability = 0.01
# A model whose VaR is too LOOSE: true vol 20% above the VaR forecast's vol
sd_true, sd_model = 0.012, 0.010
losses = -sd_true * rng.standard_normal(T) # realized losses
var_fcst = 2.326 * sd_model # constant 99% Gaussian VaR
x = int((losses > var_fcst).sum()) # exceptions
exp = p * T # expected exceptions (EQ Q6.6)
# Kupiec POF likelihood-ratio (EQ Q6.7); guard the x=0 edge case
ph = x / T
num = (1 - p)**(T - x) * p**x
den = (1 - ph)**(T - x) * (ph**x if x > 0 else 1.0)
LR = -2 * np.log(num / den) if x > 0 else -2 * np.log((1 - p)**T)
zone = "GREEN" if x <= 4 else ("YELLOW" if x <= 9 else "RED")
print(f"exceptions observed : {x} (expected {exp:.1f})")
print(f"Kupiec LR_uc : {LR:.3f} crit chi2(1,95%) = 3.841 "
f"-> {'REJECT model' if LR > 3.841 else 'cannot reject'}")
print(f"Basel traffic light : {zone}")
Stress testing & the Basel context
VaR and ES are statistical measures: they extrapolate from a sampled or modelled distribution, and they are only as good as that distribution's grip on the future. Their structural blind spot is the event the data never contained — a regime that has not happened in the window, or has not happened at all. Stress testing is the deliberate complement: instead of asking "what does the distribution say about the tail?", it asks "what happens to the book under this specific scenario, whether or not the distribution thinks it likely?"
- Historical scenarios. Replay a named crisis through today's book: the 1987 crash, the 2008 collapse, the 2020 COVID shock, the 2022 rate spike. No probability is attached — you simply reprice under those moves. Answers "if October 2008 happened again to this portfolio, what would we lose?"
- Hypothetical scenarios. Constructed shocks the past never delivered: a coordinated 300 bp rate move with equities down 25% and credit spreads doubling. Forces the desk to imagine correlations snapping to one — the exact failure mode VaR's covariance matrix smooths over.
- Reverse stress testing. Invert the question: what scenario breaks us? Solve for the set of market moves that exhausts capital, then judge how plausible that scenario is. Mandated post-2008 precisely because forward stress tests tend to test the shocks management already fears, not the ones that kill.
- Sensitivity / factor shocks. Bump one risk factor at a time (parallel yield-curve shift, vol surface up 10 points) to map where the book is most exposed — the macro cousin of the Greeks from Quant 03.
VaR answers "how bad on a normal-ish bad day?"; stress testing answers "how bad on the day the model is wrong?" The 2007–09 crisis was a catalogue of VaR's limits: short estimation windows had not seen a housing collapse, Gaussian copulas mis-priced tail correlation, and liquidity vanished from instruments the models assumed tradeable. Banks reporting comfortable VaRs lost multiples of them. Stress testing is the institutional memory the rolling window keeps erasing.
The regulatory arc reflects exactly the lessons of this chapter. The 1996 Market Risk Amendment let banks use internal 99% / 10-day VaR models for capital, with the traffic-light backtest of §6.4 as the discipline. After 2008, Basel 2.5 bolted on a stressed VaR (the model re-estimated over a crisis window) to fight the procyclicality of short windows. Then the Fundamental Review of the Trading Book (FRTB, finalized 2019) made the decisive move:
No single number is the truth. ES is coherent but harder to backtest and still distribution-dependent. Stress tests are scenario-dependent and can become theatre — testing the shocks everyone already prices in. Historical VaR re-fights the last war; parametric VaR assumes a bell curve markets do not obey. The competent risk function runs all of them, treats each as a partial view, and reserves its deepest distrust for any meeting where one number is presented as the risk. The 2008 survivors were not those with the lowest VaR — they were those who did not believe it.
You have reached the end of the Quantitative Finance volume. From the stochastic processes that model a price (Quant 01), through binomial and Black–Scholes pricing (Quant 02–03), interest-rate models and Monte-Carlo valuation (Quant 04–05), to the risk measurement that governs whether any of it is safe to hold (Quant 06) — the loop is closed: model the world, price the claim, then measure honestly how wrong the model can be. Return to the index to continue across the other volumes.
References
- Artzner, P., Delbaen, F., Eber, J.-M. & Heath, D. (1999). Coherent Measures of Risk.
- J.P. Morgan / Reuters (1996). RiskMetrics — Technical Document (4th ed.).
- Kupiec, P. H. (1995). Techniques for Verifying the Accuracy of Risk Measurement Models.
- Basel Committee on Banking Supervision (2019). Minimum Capital Requirements for Market Risk (FRTB, finalized).
- Christoffersen, P. F. (1998). Evaluating Interval Forecasts.
- Rockafellar, R. T. & Uryasev, S. (2000). Optimization of Conditional Value-at-Risk.
- Hull, J. C. (2021). Options, Futures, and Other Derivatives (11th ed.).