Volatility clustering — the stylized fact
Plot the daily returns of any liquid asset and one thing jumps out: the wild days come in bunches. October 2008, March 2020, August 2024 — each is a dense thicket of large moves up and down, separated by long stretches of placid drift. Mandelbrot noticed it in 1963: "large changes tend to be followed by large changes, of either sign, and small changes by small changes." This is volatility clustering, and it is the single most robust empirical regularity in all of finance.
The classical models of the previous chapters cannot represent it. They assume homoscedasticity — a constant variance \(\sigma^2\) for the noise term. Under that assumption a calm Tuesday and a panicked Thursday are draws from the same distribution, which is plainly false. What clustering demands instead is conditional heteroscedasticity: a variance that changes through time and, crucially, is predictable from the past even when the return itself is not.
Two more facts complete the picture and motivate everything below. First, the unconditional return distribution has fat tails — far more 4σ and 5σ days than a Gaussian allows — even when daily returns are conditionally normal, because mixing normals of different variances manufactures kurtosis for free. Second, in equity markets volatility responds asymmetrically: a 3% drop raises tomorrow's expected volatility more than a 3% gain does. That leverage effect (§4.4) is why the family did not stop at GARCH.
A subtlety worth stating up front: GARCH does not predict returns, and it would be a category error to expect it to. It predicts the scale of returns — the width of tomorrow's distribution, not its center. That is exactly the quantity risk management, option pricing, and position sizing actually need.
ARCH — conditional variance
Robert Engle's 1982 insight — which won the 2003 Nobel — was to let the variance of the current shock depend on the magnitudes of recent shocks. Write the return (after removing any mean) as a standardized innovation scaled by a time-varying volatility:
ARCH works, but it is clumsy. Real volatility persistence decays over many weeks, so capturing it with a finite sum of squared shocks forces a large \(q\) — often 5 to 10 lags — and a long parameter vector that is awkward to estimate and prone to overfitting. The model also imposes that variance reacts only to a fixed, short window of past shocks, with hard cutoffs. Engle's student Bollerslev fixed both problems in one stroke.
Parameters are fit by maximum likelihood: choose \((\omega, \alpha)\) to maximize the Gaussian log-likelihood of the observed returns under the recursively computed \(\sigma_t^2\). The objective is non-linear but smooth, and the runnable cells below show the recursion that any optimizer would evaluate at each step.
GARCH(1,1) — the workhorse
The Generalized ARCH model adds one term — yesterday's variance — and that single addition is why GARCH(1,1) has been the default for forty years. It captures slow-decaying persistence with just three parameters, an unbeatable parsimony-to-realism ratio:
Two derived quantities carry most of the intuition. The persistence is the sum \(\alpha+\beta\): it governs how slowly a volatility shock dies out, and for daily equity indices it is famously close to one — typically \(0.95\) to \(0.99\). The unconditional (long-run) variance is the level the recursion reverts to:
# Simulate a GARCH(1,1) process; plot returns and conditional vol
import numpy as np
rng = np.random.default_rng(1)
omega, alpha, beta = 1e-5, 0.10, 0.85 # persistence alpha+beta = 0.95
T = 600
var_lr = omega / (1 - alpha - beta) # long-run (unconditional) variance
r = np.zeros(T)
s2 = np.zeros(T); s2[0] = var_lr # start at the long-run level
for t in range(1, T):
s2[t] = omega + alpha * r[t-1]**2 + beta * s2[t-1]
r[t] = np.sqrt(s2[t]) * rng.standard_normal()
vol = np.sqrt(s2)
print(f"long-run daily vol : {np.sqrt(var_lr):.4f} (annualized ~{np.sqrt(var_lr)*np.sqrt(252):.1%})")
print(f"realised daily vol : {r.std():.4f}")
print(f"max |return| : {np.abs(r).max():.4f} on day {int(np.argmax(np.abs(r)))}")
print(f"corr(r, lag r) : {np.corrcoef(r[1:], r[:-1])[0,1]:+.3f} (near 0: level is noise)")
print(f"corr(r^2, lag r^2) : {np.corrcoef(r[1:]**2, (r[:-1])**2)[0,1]:+.3f} (positive: magnitude has memory)")
plot_xy(list(range(T)), vol) # the clustering, made visible
# Run the GARCH(1,1) variance recursion on returns; print the one-step vol
import numpy as np
omega, alpha, beta = 1e-5, 0.10, 0.85
# a short return series ending in two violent days (a shock arriving)
r = np.array([0.004, -0.006, 0.002, -0.003, 0.005,
-0.028, 0.031, -0.004, 0.007, -0.002])
var_lr = omega / (1 - alpha - beta)
s2 = var_lr # seed at the long-run variance
print(" day return sigma^2 sigma (daily vol)")
for t, rt in enumerate(r):
s2 = omega + alpha * rt**2 + beta * s2 # EQ T4.4
print(f" {t:2d} {rt:+.4f} {s2:.6e} {np.sqrt(s2):.4%}")
# one-step-ahead forecast uses the LAST observed shock and variance
s2_next = omega + alpha * r[-1]**2 + beta * s2
print(f"\none-step-ahead sigma^2 : {s2_next:.6e}")
print(f"one-step-ahead vol : {np.sqrt(s2_next):.4%} (note the spike that lingers)")
Asymmetry — EGARCH & GJR-GARCH
Plain GARCH has a blind spot baked into its algebra: it reacts to \(\varepsilon_{t-1}^2\), and squaring throws away the sign. A −4% day and a +4% day produce identical forecasts. But equity volatility is emphatically not symmetric — bad news raises future volatility far more than equally-sized good news. This leverage effect (a falling stock raises its debt-to-equity ratio, mechanically raising risk; and falling prices trigger forced selling and fear) is one of the most reliable patterns in markets, and two extensions of GARCH were built to capture it.
GJR-GARCH (Glosten–Jagannathan–Runkle, 1993) is the minimal fix: add one term that switches on only for negative shocks.
EGARCH (Nelson, 1991) takes a different route: model the log of variance, which guarantees positivity without any constraints on the signs of the coefficients, and let the news term depend on both the magnitude and the sign of the standardized shock \(z_{t-1} = \varepsilon_{t-1}/\sigma_{t-1}\).
The news-impact curve — next period's variance plotted against this period's shock, holding \(\sigma_{t-1}^2\) fixed — is the cleanest way to see the difference. Plain GARCH gives a symmetric parabola centered at zero; GJR and EGARCH tilt it, steepening the left (bad-news) arm.
Which to use is genuinely contested. GJR is simpler, nests GARCH cleanly (test \(\gamma=0\)), and is easy to forecast — most practitioners reach for it first. EGARCH is more flexible and unconstrained but harder to fit and to project forward, and its log scale makes the parameters less directly interpretable. Hansen & Lunde's large 2005 horse race found that for daily equity data nothing reliably beat a plain GARCH(1,1) for forecasting, while for exchange rates the asymmetric variants helped little — a useful humility check against over-engineering.
Forecasting volatility & the VaR link
The payoff of a fitted GARCH model is a forecast of future variance, and the recursion makes multi-step forecasts almost free. The one-step forecast is just the recursion evaluated at the last observed values. For horizons beyond one, the unknown future shock \(\varepsilon_{t+h-1}^2\) is replaced by its expectation, which under the model is the forecast variance itself — collapsing the whole thing to clean geometric mean reversion toward \(\bar\sigma^2\):
This term structure is precisely what an option's implied-volatility surface tries to price, and what a risk system needs to project losses over a 1-day or 10-day horizon. The most consequential application is Value-at-Risk (VaR): the loss threshold a portfolio will not exceed with probability \(1-p\) over a given horizon. Plug the GARCH conditional volatility into the quantile of the innovation distribution:
Why a conditional VaR matters. A static VaR built on a trailing 250-day window treats March 2020 and a sleepy summer as equally likely tomorrow. It under-warns going into a crisis (the window is still full of calm days) and over-warns coming out of one (the window is still full of the crash). GARCH-based VaR reacts within a day or two because \(\sigma_t\) is recomputed every step — the practical reason banks adopted conditional volatility models for capital after 1996.
GARCH is not the last word. It assumes the variance process is driven only by past returns; realized-volatility models (HAR-RV) instead feed high-frequency intraday data straight in and routinely forecast better. Stochastic-volatility models give variance its own innovation term rather than making it a deterministic function of past shocks — more flexible, harder to estimate. And implied volatility from options markets is forward-looking in a way no return-based model can be. But for a three-parameter model you can fit in milliseconds and explain on a napkin, GARCH(1,1) remains the benchmark every richer model must beat — and frequently does not.
We have modeled the volatility of one series in isolation — but risk lives in how series move together. A portfolio's variance is a quadratic form in a whole covariance matrix, and in a crisis correlations snap toward one exactly when diversification is supposed to save you. Chapter 05 turns the dial from one dimension to many: Vector Autoregression (VAR) for the joint dynamics of several series, the cross-correlations and Granger causality they encode, and the multivariate-GARCH machinery (DCC) that lets the whole covariance matrix breathe through time.
References
- Engle, R. F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.
- Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity.
- Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach.
- Glosten, L. R., Jagannathan, R. & Runkle, D. E. (1993). On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.
- Hansen, P. R. & Lunde, A. (2005). A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1,1)?.
- Engle, R. F. (2002). Dynamic Conditional Correlation: A Simple Class of Multivariate GARCH Models.
- Mandelbrot, B. (1963). The Variation of Certain Speculative Prices.