Risk Measurement — VaR, CVaR & Stress Testing

6.1

Value at Risk: the definition

A portfolio's profit-and-loss over a horizon $h$ (one day, ten days) is a random variable $L$ — by convention, a loss, so positive $L$ means money lost. Value at Risk asks one question: pick a confidence level $\alpha$ (say 99%); what is the loss threshold the portfolio will not exceed with probability $\alpha$? It is, precisely, a quantile of the loss distribution.

EQ Q6.1 — VALUE AT RISK $$ \mathrm{VaR}_\alpha(L) \;=\; \inf\{\, \ell \in \mathbb{R} : \Pr(L \le \ell) \ge \alpha \,\} \;=\; F_L^{-1}(\alpha) $$

$F_L$ is the cumulative distribution of the loss; $F_L^{-1}(\alpha)$ is its $\alpha$-quantile. The 99% one-day VaR is the loss such that only 1 day in 100 should be worse. Note what VaR does not say: it is silent about how much worse those bad days are. It is a threshold, not an average — and that single omission is the seed of every criticism in §6.3.

Three conventions trip up newcomers, so fix them now. First, sign: VaR is reported as a positive number ("our 99% VaR is $4.2M"), even though it lives in the left tail of P&L. Second, confidence vs. tail: 99% VaR and "the 1% tail" are the same object — $\alpha = 0.99$ means a tail probability $p = 1 - \alpha = 0.01$. Third, horizon: a one-day VaR scales to $h$ days under the i.i.d.-normal assumption by the square-root-of-time rule, $\mathrm{VaR}_h \approx \sqrt{h}\,\mathrm{VaR}_1$ — an approximation that quietly fails when returns are autocorrelated or fat-tailed.

WHY ONE NUMBER

VaR's power is sociological as much as mathematical. Before RiskMetrics popularized it in 1994–96, a bank's market risk lived in a hundred incomparable desk reports. VaR gave one currency-denominated number that aggregates across asset classes, lets a CRO set a firm-wide limit, and lets a regulator demand capital against it. That it throws away the shape of the tail is the price of that universality — and the reason the field spent the next thirty years patching it.

For a quick desk estimate, assume losses are normal with mean $\mu$ and standard deviation $\sigma$. Then the quantile has a closed form, and almost every parametric VaR you will ever see is this one line:

EQ Q6.2 — PARAMETRIC (GAUSSIAN) VaR $$ \mathrm{VaR}_\alpha \;=\; \mu + z_\alpha\,\sigma, \qquad z_\alpha = \Phi^{-1}(\alpha), \qquad z_{0.95} = 1.645,\;\; z_{0.99} = 2.326 $$

$\Phi^{-1}$ is the standard-normal inverse CDF; $z_\alpha$ is the number of standard deviations into the tail. For a portfolio with zero mean (the usual short-horizon assumption — drift is negligible over a day), VaR collapses to $z_\alpha\,\sigma$: the 99% VaR is simply 2.326 times the daily volatility. This is the workhorse formula, and also the one that most badly under-states risk when returns have fat tails.

A portfolio has daily return volatility $\sigma = 2\%$ and effectively zero mean over one day. Using the Gaussian formula (EQ Q6.2), what is its 99% one-day VaR, in percent? (Use $z_{0.99} = 2.326$.)

Zero mean, so $\mathrm{VaR}_{0.99} = z_{0.99}\,\sigma = 2.326 \times 2\% = $ 4.66%. A 99% one-day VaR of 4.66% means the desk expects to lose more than 4.66% of the book on roughly one trading day in a hundred — about two or three days a year.

Already the cracks show. The Gaussian VaR multiplier 2.326 assumes returns are normal; in reality equity returns have kurtosis well above 3, so the true 1% quantile sits further out than the formula admits. Two of the three methods in §6.2 exist precisely to escape this assumption.

6.2

Historical, parametric & Monte-Carlo VaR

EQ Q6.1 defines a quantile of the loss distribution — but you never have that distribution; you estimate it. The three industry methods are exactly three answers to "where does the loss distribution come from?"

Method	Loss distribution from…	Strength	Weakness
Historical	the empirical sample of past returns	No distributional assumption; captures real fat tails & skew	Backward-looking; blind to risks absent from the window; ghost effects when crises age out
Parametric	a fitted model (usually Gaussian) — EQ Q6.2	Fast, analytic, aggregates linearly via the covariance matrix	Normality under-states tails; useless for non-linear payoffs (options)
Monte-Carlo	simulated paths from a chosen model	Handles non-linearity, path dependence, any distribution you can sample	Slow; only as good as the model you simulate; sampling noise

Historical VaR is the most honest and the most popular. Take $N$ past daily P&L observations, sort them, and read off the empirical $(1-\alpha)$ quantile of the losses. No bell curve is assumed; if the last year held a 6-sigma day, it sits right there in the sample.

EQ Q6.3 — HISTORICAL VaR (EMPIRICAL QUANTILE) $$ \widehat{\mathrm{VaR}}_\alpha \;=\; L_{(\lceil \alpha N \rceil)}, \qquad L_{(1)} \le L_{(2)} \le \cdots \le L_{(N)} \;\;\text{(losses, ascending)} $$

Sort the $N$ historical losses; the $\lceil \alpha N \rceil$-th order statistic is the estimate. With $N = 500$ days and $\alpha = 0.99$, that is the 5th-worst loss in the sample ($\lceil 0.99 \times 500 \rceil = 495$ from the bottom, i.e. the 5th from the top). Its great virtue is that it inherits whatever fat tails the market actually printed; its great vice is that it is mute about anything the window never saw.

Parametric VaR is EQ Q6.2 with $\sigma$ estimated from the same window (or from a volatility model — an EWMA or the GARCH machinery of Quant 01). For a linear multi-asset book it aggregates beautifully: portfolio variance is $\mathbf{w}^\top \Sigma\, \mathbf{w}$, so one covariance matrix prices the VaR of the whole firm. The original RiskMetrics methodology was exactly this — Gaussian, EWMA-weighted covariance, square-root-of-time scaling.

Monte-Carlo VaR earns its cost when payoffs are non-linear. An options book's P&L is not a linear function of the underlying, so neither the empirical return sample nor a single $\sigma$ captures it. Instead you simulate thousands of market scenarios (using the path machinery of Quant 05), fully reprice the book under each, and take the empirical quantile of the simulated P&L — the same order-statistic of EQ Q6.3, but on simulated rather than historical losses.

You hold $N = 500$ sorted daily losses and want the 99% historical VaR (EQ Q6.3). Counting from the smallest loss as position 1, which order-statistic position do you read off? (Compute $\lceil \alpha N \rceil$ with $\alpha = 0.99$.)

$\lceil 0.99 \times 500 \rceil = \lceil 495 \rceil = $ 495. Position 495 from the bottom of 500 is the 6th-largest loss; reading the very next gap up gives the 5 worst days as the tail beyond VaR — exactly the $1\% \times 500 = 5$ observations you expect past a 99% threshold.

PYTHON · RUNNABLE IN-BROWSER

# Historical & parametric VaR + Expected Shortfall from a return series
import numpy as np
rng = np.random.default_rng(0)

# 1000 days of fat-tailed returns (Student-t, df=4): heavier tails than normal
rets = 0.01 * rng.standard_t(df=4, size=1000)     # daily returns, ~1% scale
losses = -rets                                     # convention: loss = -return
alpha = 0.99

# Historical VaR/ES: empirical quantile, then mean of the worse tail (EQ Q6.3, Q6.4)
var_h = np.quantile(losses, alpha)
es_h  = losses[losses >= var_h].mean()

# Parametric Gaussian VaR/ES (EQ Q6.2): z=2.326, ES uses phi(z)/(1-alpha)
mu, sd = losses.mean(), losses.std(ddof=1)
z   = 2.326                                         # Phi^{-1}(0.99)
var_p = mu + z * sd
phi   = np.exp(-0.5 * z * z) / np.sqrt(2 * np.pi)   # standard-normal pdf at z
es_p  = mu + sd * phi / (1 - alpha)                 # Gaussian ES

print(f"sample sd (daily)     : {sd*100:.3f} %")
print(f"historical  99% VaR   : {var_h*100:.3f} %   ES: {es_h*100:.3f} %")
print(f"parametric  99% VaR   : {var_p*100:.3f} %   ES: {es_p*100:.3f} %")
print(f"\nfat tails -> historical VaR exceeds the Gaussian one by "
      f"{(var_h-var_p)/var_p*100:+.1f}%  (normality under-states the tail)")

edits are live — break it on purpose

INSTRUMENT Q6.1 — VaR / ES TAIL EXPLORERP&L DENSITY · CONFIDENCE α · EQ Q6.1–Q6.4

CONFIDENCE α 99.0%

DAILY VOL σ 2.0%

TAIL FATNESS (kurtosis) 3.0

VaR @ α

—

EXPECTED SHORTFALL

—

ES / VaR RATIO

—

The red region is the loss tail beyond VaR — its area is exactly $1-\alpha$. The mint line marks VaR (the tail's edge); the blue line marks ES (the tail's centre of mass), always further out. Slide tail fatness above 3 and watch the density grow heavy tails: VaR creeps right, but ES sprints — the gap between them is the chapter's whole argument. Push α toward 99.5% and the red sliver shrinks while both lines march outward.

6.3

Expected Shortfall (CVaR)

VaR tells you the edge of the bad zone and nothing about its depth. Two portfolios can share an identical 99% VaR while one loses a little past it and the other is wiped out — VaR cannot tell them apart. Expected Shortfall (also Conditional VaR, CVaR, or expected tail loss) fixes this by averaging the losses that do breach VaR:

EQ Q6.4 — EXPECTED SHORTFALL $$ \mathrm{ES}_\alpha(L) \;=\; \mathbb{E}\!\left[\, L \mid L \ge \mathrm{VaR}_\alpha \,\right] \;=\; \frac{1}{1-\alpha} \int_\alpha^1 \mathrm{VaR}_u(L)\,\mathrm{d}u $$

ES is the average loss conditional on being in the worst $(1-\alpha)$ tail — the mean of everything past the VaR cliff. The integral form shows it as the average of all VaRs deeper than $\alpha$, which makes its key property obvious: ES $\ge$ VaR at the same level, always, since you are averaging values that are all $\ge \mathrm{VaR}_\alpha$. Equality holds only in the degenerate case where the tail beyond VaR is a single point.

For the Gaussian case ES has a clean closed form, which makes the relationship to VaR exact and lets you feel the multiplier:

EQ Q6.5 — GAUSSIAN EXPECTED SHORTFALL $$ \mathrm{ES}_\alpha \;=\; \mu + \sigma\,\frac{\varphi\big(z_\alpha\big)}{1-\alpha}, \qquad \varphi(z) = \frac{1}{\sqrt{2\pi}}e^{-z^2/2}, \qquad \frac{\varphi(2.326)}{0.01} \approx 2.665 $$

$\varphi$ is the standard-normal density. At 99% the ES multiplier is $\approx 2.665$ versus the VaR multiplier $2.326$ — so for a zero-mean Gaussian book, ES is about 15% larger than VaR. Crucially this is the thin-tailed gap; under fat tails ES pulls away much faster, which is exactly why regulators switched to it (§6.5).

The deeper reason ES won is theoretical. Artzner, Delbaen, Eber and Heath (1999) laid down four axioms any sensible risk measure should obey — monotonicity, translation invariance, positive homogeneity, and subadditivity — and called a measure satisfying all four coherent. Subadditivity is the one that bites: $\rho(A + B) \le \rho(A) + \rho(B)$, i.e. diversification cannot increase risk. ES is coherent. VaR is not — it can violate subadditivity, reporting a merged portfolio as riskier than the sum of its parts, which is not just ugly but actively penalizes hedging and diversification.

THE SUBADDITIVITY TRAP

VaR can punish diversification. Take two independent corporate bonds, each defaulting with probability 4% (loss 100, else 0). The 95% VaR of each alone is 0 — the 4% default sits inside the 5% tail, so the 95th-percentile loss is zero. But a portfolio of both defaults-at-least-once with probability $1 - 0.96^2 \approx 7.8\% > 5\%$, so its 95% VaR is 100. VaR of the pair (100) exceeds the sum of the parts (0 + 0): diversifying made the reported risk explode. ES suffers no such pathology — averaging the tail restores additivity. This single example, more than any other, is why the Basel framework migrated off VaR.

True or false: at the same confidence level $\alpha$, Expected Shortfall is always at least as large as Value at Risk — $\mathrm{ES}_\alpha \ge \mathrm{VaR}_\alpha$. (Answer true or false.)

By EQ Q6.4, ES is the mean of losses that are all $\ge \mathrm{VaR}_\alpha$ (they live beyond the VaR threshold), so their average cannot be smaller than that threshold. Hence $\mathrm{ES}_\alpha \ge \mathrm{VaR}_\alpha$ for every distribution, with equality only when the entire tail collapses to a point. The statement is true.

INSTRUMENT Q6.2 — THREE-METHOD VaR COMPARISONHISTORICAL vs PARAMETRIC vs MONTE-CARLO · 99% / 95%

CONFIDENCE α 99.0%

TAIL FATNESS (kurtosis) 6.0

SAMPLE SIZE N 750

HISTORICAL VaR

—

PARAMETRIC VaR

—

MONTE-CARLO VaR

—

A deterministic synthetic loss sample (fixed seed, so it renders identically on load). The three bars are the same 99% VaR computed three ways on the same data. At kurtosis 3 they nearly agree — Gaussian is true, so parametric is fine. Crank tail fatness up and the parametric bar falls badly behind: the normal model cannot see the tail the historical and Monte-Carlo estimates capture. Shrink N and watch the historical estimate get noisy — the cost of being assumption-free is needing data.

ES is not free of trouble. It is harder to backtest than VaR — you are estimating a conditional mean from a handful of tail observations, and a clean pass/fail test (the subject of §6.4) was elusive for years. VaR remains the easier number to validate, which is why both live side by side in the modern regime: ES sets the capital, VaR-style exceedance counting still polices the model.

6.4

Backtesting VaR — Kupiec & the traffic light

A VaR number is a falsifiable prediction: at 99% confidence, losses should exceed VaR on about 1% of days. So count the exceedances (days where realized loss > that day's VaR forecast) and ask whether the count is consistent with the model. Over $T$ days at tail probability $p = 1 - \alpha$, the number of exceptions $x$ is, under a correct model, Binomial$(T, p)$ with expected value $pT$.

EQ Q6.6 — EXPECTED EXCEPTIONS $$ x \sim \mathrm{Binomial}(T, p), \qquad \mathbb{E}[x] = pT, \qquad p = 1 - \alpha $$

Over one regulatory year ($T = 250$ trading days) at 99% ($p = 0.01$), you expect $0.01 \times 250 = 2.5$ exceptions. Too few and your VaR is needlessly conservative (wasting capital); too many and it is dangerously optimistic. The whole game is deciding when an observed count $x$ is "too many" — and randomness means even a perfect model occasionally throws 6 or 7.

Kupiec (1995) made "too many" precise with a likelihood-ratio test of the unconditional coverage — the proportion-of-failures (POF) test. It compares the model's claimed rate $p$ against the observed rate $x/T$:

EQ Q6.7 — KUPIEC POF LIKELIHOOD-RATIO TEST $$ \mathrm{LR}_{\text{uc}} = -2\ln\!\left[ \frac{(1-p)^{T-x}\,p^{\,x}}{\left(1-\tfrac{x}{T}\right)^{T-x}\left(\tfrac{x}{T}\right)^{x}} \right] \;\;\overset{H_0}{\sim}\;\; \chi^2_1 $$

The numerator is the likelihood under the model's rate $p$; the denominator under the observed rate $\hat p = x/T$. Under the null "the model is correctly calibrated", $\mathrm{LR}_{\text{uc}}$ follows a chi-squared with 1 degree of freedom. Reject the model when $\mathrm{LR}_{\text{uc}} > 3.841$ (the 95% $\chi^2_1$ critical value). It is a two-sided test — it flags both too many exceptions and suspiciously too few.

Kupiec's test only counts exceptions; it ignores when they happened. Christoffersen (1998) added an independence test (exceptions should not cluster — a model that fails on five consecutive days is broken even if the count looks fine) and combined the two into a conditional-coverage test. The two ideas — right number, well-spaced — are the backbone of every modern VaR backtest.

Regulators encode a blunter, more operational version: the Basel traffic light. Count exceptions over the last 250 trading days at 99% and bucket the firm:

Zone	Exceptions (250 days, 99%)	Cumulative P(≤x) under correct model	Consequence
Green	0 – 4	≈ 89.2%	Model accepted; no capital penalty
Yellow	5 – 9	89.2% – 99.99%	Multiplier raised (≈ +0.4 to +0.85); supervisory scrutiny
Red	10+	> 99.99%	Model presumed flawed; max multiplier, remediation demanded

The boundaries are not round numbers; they come from the Binomial. At 99% over 250 days the expected count is 2.5, and the probability of seeing 10 or more exceptions if the model is correct is under 0.01% — so 10 exceptions is overwhelming evidence the model is wrong, not bad luck. The yellow zone is the honest middle: bad enough to worry, not damning enough to condemn. The asymmetry (no penalty for too few exceptions) is deliberate — the supervisor cares about under-stated risk, not over-caution.

Under the Basel backtest you observe a 99% VaR over $T = 250$ trading days. If the model is correctly calibrated, how many exceptions do you expect on average? (Use EQ Q6.6.)

$p = 1 - 0.99 = 0.01$, so $\mathbb{E}[x] = pT = 0.01 \times 250 = $ 2.5. The green zone (0–4) brackets this expectation; 5+ tips into yellow because observing that many exceptions becomes improbable under a correct model.

PYTHON · RUNNABLE IN-BROWSER

# Backtest VaR: count exceedances vs expected, Kupiec POF + Basel zone
import numpy as np
rng = np.random.default_rng(1)

T, alpha = 250, 0.99
p = 1 - alpha                                       # tail probability = 0.01

# A model whose VaR is too LOOSE: true vol 20% above the VaR forecast's vol
sd_true, sd_model = 0.012, 0.010
losses   = -sd_true * rng.standard_normal(T)        # realized losses
var_fcst =  2.326 * sd_model                        # constant 99% Gaussian VaR

x   = int((losses > var_fcst).sum())                # exceptions
exp = p * T                                         # expected exceptions (EQ Q6.6)

# Kupiec POF likelihood-ratio (EQ Q6.7); guard the x=0 edge case
ph = x / T
num = (1 - p)**(T - x) * p**x
den = (1 - ph)**(T - x) * (ph**x if x > 0 else 1.0)
LR  = -2 * np.log(num / den) if x > 0 else -2 * np.log((1 - p)**T)

zone = "GREEN" if x <= 4 else ("YELLOW" if x <= 9 else "RED")
print(f"exceptions observed : {x}   (expected {exp:.1f})")
print(f"Kupiec LR_uc        : {LR:.3f}   crit chi2(1,95%) = 3.841 "
      f"-> {'REJECT model' if LR > 3.841 else 'cannot reject'}")
print(f"Basel traffic light : {zone}")

edits are live — break it on purpose

INSTRUMENT Q6.3 — VaR BACKTEST EXCEEDANCE COUNTER250-DAY P&L vs VaR LINE · BASEL ZONE · EQ Q6.6–Q6.7

VaR CONFIDENCE α 99.0%

MODEL MISCALIBRATION 1.00×

EXCEPTIONS / EXPECTED

—

KUPIEC LR (crit 3.841)

—

BASEL ZONE

—

250 days of deterministic synthetic losses (bars) against a flat 99% VaR line; bars that punch through it are exceptions. Miscalibration 1.0× is an honest model — expect about 2–3 exceptions, comfortably green. Drag it above 1.0 to make true risk outrun the VaR forecast: exceptions multiply, the Kupiec LR climbs past 3.841, and the zone flips yellow then red. Drag below 1.0 to over-state risk and watch exceptions vanish — safe for the firm, but a Kupiec failure for being too conservative and wasting capital.

6.5

Stress testing & the Basel context

VaR and ES are statistical measures: they extrapolate from a sampled or modelled distribution, and they are only as good as that distribution's grip on the future. Their structural blind spot is the event the data never contained — a regime that has not happened in the window, or has not happened at all. Stress testing is the deliberate complement: instead of asking "what does the distribution say about the tail?", it asks "what happens to the book under this specific scenario, whether or not the distribution thinks it likely?"

Historical scenarios. Replay a named crisis through today's book: the 1987 crash, the 2008 collapse, the 2020 COVID shock, the 2022 rate spike. No probability is attached — you simply reprice under those moves. Answers "if October 2008 happened again to this portfolio, what would we lose?"
Hypothetical scenarios. Constructed shocks the past never delivered: a coordinated 300 bp rate move with equities down 25% and credit spreads doubling. Forces the desk to imagine correlations snapping to one — the exact failure mode VaR's covariance matrix smooths over.
Reverse stress testing. Invert the question: what scenario breaks us? Solve for the set of market moves that exhausts capital, then judge how plausible that scenario is. Mandated post-2008 precisely because forward stress tests tend to test the shocks management already fears, not the ones that kill.
Sensitivity / factor shocks. Bump one risk factor at a time (parallel yield-curve shift, vol surface up 10 points) to map where the book is most exposed — the macro cousin of the Greeks from Quant 03.

WHY STRESS TESTS EXIST

VaR answers "how bad on a normal-ish bad day?"; stress testing answers "how bad on the day the model is wrong?" The 2007–09 crisis was a catalogue of VaR's limits: short estimation windows had not seen a housing collapse, Gaussian copulas mis-priced tail correlation, and liquidity vanished from instruments the models assumed tradeable. Banks reporting comfortable VaRs lost multiples of them. Stress testing is the institutional memory the rolling window keeps erasing.

The regulatory arc reflects exactly the lessons of this chapter. The 1996 Market Risk Amendment let banks use internal 99% / 10-day VaR models for capital, with the traffic-light backtest of §6.4 as the discipline. After 2008, Basel 2.5 bolted on a stressed VaR (the model re-estimated over a crisis window) to fight the procyclicality of short windows. Then the Fundamental Review of the Trading Book (FRTB, finalized 2019) made the decisive move:

EQ Q6.8 — FRTB EXPECTED SHORTFALL CAPITAL (SCHEMATIC) $$ \mathrm{ES}_{97.5\%}^{\text{stressed}} \;=\; \text{ES at } \alpha = 0.975 \text{, calibrated to a stress period, with liquidity-horizon scaling} $$

FRTB replaces 99% VaR with 97.5% Expected Shortfall as the capital measure, calibrated to a period of significant stress and scaled by instrument-specific liquidity horizons. The level shift (99% → 97.5%) is deliberate: for a normal distribution, 97.5% ES $\approx$ 99% VaR, so the change is meant to recover similar magnitudes while gaining ES's tail-sensitivity and coherence. VaR does not disappear — exception counting on a 99% VaR still drives the backtesting and the green/yellow/red model-approval test. ES sets the capital; VaR still polices the model.

THE HONEST CAVEAT

No single number is the truth. ES is coherent but harder to backtest and still distribution-dependent. Stress tests are scenario-dependent and can become theatre — testing the shocks everyone already prices in. Historical VaR re-fights the last war; parametric VaR assumes a bell curve markets do not obey. The competent risk function runs all of them, treats each as a partial view, and reserves its deepest distrust for any meeting where one number is presented as the risk. The 2008 survivors were not those with the lowest VaR — they were those who did not believe it.

Under the FRTB framework (EQ Q6.8), the regulatory capital measure for market risk is Expected Shortfall calibrated at what confidence level $\alpha$, in percent? (The level chosen so that, for a normal distribution, it roughly matches the old 99% VaR.)

FRTB sets the capital measure at 97.5% ES. For a Gaussian loss, $\mathrm{ES}_{97.5\%} \approx \mathrm{VaR}_{99\%}$ (the ES averaging at the lower confidence reaches about the same magnitude as VaR at the higher one), so the regime gains coherence and tail-sensitivity without a wholesale change in capital magnitude.

You have reached the end of the Quantitative Finance volume. From the stochastic processes that model a price (Quant 01), through binomial and Black–Scholes pricing (Quant 02–03), interest-rate models and Monte-Carlo valuation (Quant 04–05), to the risk measurement that governs whether any of it is safe to hold (Quant 06) — the loop is closed: model the world, price the claim, then measure honestly how wrong the model can be. Return to the index to continue across the other volumes.

6.R

References

Artzner, P., Delbaen, F., Eber, J.-M. & Heath, D. (1999). Coherent Measures of Risk. Mathematical Finance 9(3) — the four coherence axioms; why VaR fails subadditivity and ES does not.
J.P. Morgan / Reuters (1996). RiskMetrics — Technical Document (4th ed.). The document that made parametric VaR an industry standard: EWMA covariance, Gaussian quantiles, √-time scaling.
Kupiec, P. H. (1995). Techniques for Verifying the Accuracy of Risk Measurement Models. Journal of Derivatives 3(2) — the proportion-of-failures likelihood-ratio backtest (EQ Q6.7).
Basel Committee on Banking Supervision (2019). Minimum Capital Requirements for Market Risk (FRTB, finalized). BIS d457 — the switch to 97.5% stressed Expected Shortfall with liquidity-horizon scaling (EQ Q6.8).
Christoffersen, P. F. (1998). Evaluating Interval Forecasts. International Economic Review 39(4) — the conditional-coverage / independence test that complements Kupiec.
Rockafellar, R. T. & Uryasev, S. (2000). Optimization of Conditional Value-at-Risk. Journal of Risk 2(3) — CVaR/ES as a convex, optimizable risk measure.
Hull, J. C. (2021). Options, Futures, and Other Derivatives (11th ed.). Pearson — the standard practitioner treatment of VaR, ES, historical simulation and backtesting.