Volatility Modeling — ARCH & GARCH

4.1

Volatility clustering — the stylized fact

Plot the daily returns of any liquid asset and one thing jumps out: the wild days come in bunches. October 2008, March 2020, August 2024 — each is a dense thicket of large moves up and down, separated by long stretches of placid drift. Mandelbrot noticed it in 1963: "large changes tend to be followed by large changes, of either sign, and small changes by small changes." This is volatility clustering, and it is the single most robust empirical regularity in all of finance.

The classical models of the previous chapters cannot represent it. They assume homoscedasticity — a constant variance $\sigma^2$ for the noise term. Under that assumption a calm Tuesday and a panicked Thursday are draws from the same distribution, which is plainly false. What clustering demands instead is conditional heteroscedasticity: a variance that changes through time and, crucially, is predictable from the past even when the return itself is not.

EQ T4.1 — THE STYLIZED FACTS, FORMALLY $$ \mathrm{Corr}(r_t,\, r_{t-k}) \approx 0 \qquad\text{but}\qquad \mathrm{Corr}(r_t^2,\, r_{t-k}^2) > 0 \;\; \text{for many lags } k $$

Raw returns $r_t$ are serially uncorrelated (you cannot predict tomorrow's sign — markets are near-efficient). Yet squared (or absolute) returns are strongly positively autocorrelated, and that autocorrelation decays slowly. The level is noise; the magnitude has memory. Returns are also fat-tailed (leptokurtic) and, in equities, negatively skewed — a model of volatility must reproduce all three.

Two more facts complete the picture and motivate everything below. First, the unconditional return distribution has fat tails — far more 4σ and 5σ days than a Gaussian allows — even when daily returns are conditionally normal, because mixing normals of different variances manufactures kurtosis for free. Second, in equity markets volatility responds asymmetrically: a 3% drop raises tomorrow's expected volatility more than a 3% gain does. That leverage effect (§4.4) is why the family did not stop at GARCH.

A subtlety worth stating up front: GARCH does not predict returns, and it would be a category error to expect it to. It predicts the scale of returns — the width of tomorrow's distribution, not its center. That is exactly the quantity risk management, option pricing, and position sizing actually need.

4.2

ARCH — conditional variance

Robert Engle's 1982 insight — which won the 2003 Nobel — was to let the variance of the current shock depend on the magnitudes of recent shocks. Write the return (after removing any mean) as a standardized innovation scaled by a time-varying volatility:

EQ T4.2 — THE ARCH(q) MODEL $$ r_t = \mu + \varepsilon_t, \qquad \varepsilon_t = \sigma_t\, z_t, \quad z_t \overset{\text{iid}}{\sim} \mathcal{N}(0,1), \qquad \sigma_t^2 = \omega + \sum_{i=1}^{q}\alpha_i\, \varepsilon_{t-i}^2 $$

$z_t$ is the unpredictable part — pure white noise of unit variance. All the structure lives in $\sigma_t^2$, the conditional variance: a baseline $\omega > 0$ plus a weighted sum of recent squared shocks. A big move yesterday ($\varepsilon_{t-1}^2$ large) mechanically inflates today's variance, then feeds forward — that is clustering, written as a recursion. For the variance to stay positive we need $\omega > 0,\ \alpha_i\ge 0$; for it to be stationary we need $\sum_i \alpha_i < 1$.

ARCH works, but it is clumsy. Real volatility persistence decays over many weeks, so capturing it with a finite sum of squared shocks forces a large $q$ — often 5 to 10 lags — and a long parameter vector that is awkward to estimate and prone to overfitting. The model also imposes that variance reacts only to a fixed, short window of past shocks, with hard cutoffs. Engle's student Bollerslev fixed both problems in one stroke.

Parameters are fit by maximum likelihood: choose $(\omega, \alpha)$ to maximize the Gaussian log-likelihood of the observed returns under the recursively computed $\sigma_t^2$. The objective is non-linear but smooth, and the runnable cells below show the recursion that any optimizer would evaluate at each step.

EQ T4.3 — THE GAUSSIAN LOG-LIKELIHOOD (WHAT MLE MAXIMIZES) $$ \ell(\theta) = -\frac{1}{2}\sum_{t=1}^{T}\!\left[\log(2\pi) + \log \sigma_t^2(\theta) + \frac{\varepsilon_t^2}{\sigma_t^2(\theta)}\right] $$

Each term rewards a $\sigma_t^2$ that is large when the shock is large and small when it is small: the $\varepsilon_t^2/\sigma_t^2$ penalty punishes underestimating a violent day, while $\log\sigma_t^2$ punishes crying wolf on a calm one. Maximizing this is exactly learning to size tomorrow's distribution from today's. Heavy-tailed innovations (Student-t) replace the Gaussian when residuals stay fat-tailed after fitting — common for daily equity data.

4.3

GARCH(1,1) — the workhorse

The Generalized ARCH model adds one term — yesterday's variance — and that single addition is why GARCH(1,1) has been the default for forty years. It captures slow-decaying persistence with just three parameters, an unbeatable parsimony-to-realism ratio:

EQ T4.4 — GARCH(1,1) $$ \sigma_t^2 = \omega + \alpha\,\varepsilon_{t-1}^2 + \beta\,\sigma_{t-1}^2, \qquad \omega > 0,\ \alpha\ge 0,\ \beta\ge 0,\ \alpha+\beta < 1 $$

Three forces set tomorrow's variance: a constant floor $\omega$; the news / reaction term $\alpha\,\varepsilon_{t-1}^2$ (how hard yesterday's surprise hits); and the memory / persistence term $\beta\,\sigma_{t-1}^2$ (how much of yesterday's variance carries over). Unrolling the recursion shows GARCH(1,1) is an infinite exponentially-weighted sum of all past squared shocks — an ARCH(∞) — which is exactly why three numbers do the work of ten.

Two derived quantities carry most of the intuition. The persistence is the sum $\alpha+\beta$: it governs how slowly a volatility shock dies out, and for daily equity indices it is famously close to one — typically $0.95$ to $0.99$. The unconditional (long-run) variance is the level the recursion reverts to:

EQ T4.5 — LONG-RUN VARIANCE & MEAN REVERSION $$ \bar{\sigma}^2 \;=\; \frac{\omega}{1 - \alpha - \beta}, \qquad \sigma_t^2 - \bar{\sigma}^2 \;=\; \alpha\big(\varepsilon_{t-1}^2 - \bar{\sigma}^2\big) + (\alpha+\beta)\big(\sigma_{t-1}^2 - \bar{\sigma}^2\big) $$

Taking expectations of EQ T4.4 in the stationary state gives $\bar\sigma^2(1-\alpha-\beta)=\omega$. Variance always pulls back toward $\bar\sigma^2$: after a spike it decays, after a lull it rises. The closer $\alpha+\beta$ is to 1, the slower that pull — at $\alpha+\beta=1$ shocks never fully fade (the IGARCH boundary, where $\bar\sigma^2$ is undefined and the EWMA / RiskMetrics model lives). This single number, the half-life $\log(0.5)/\log(\alpha+\beta)$, is what a risk manager reads first.

A GARCH(1,1) model has $\omega = 0.00001$, $\alpha = 0.1$, $\beta = 0.85$. Yesterday's conditional variance was $\sigma_{t-1}^2 = 0.0004$ and yesterday's squared shock was $\varepsilon_{t-1}^2 = 0.0009$. What is today's conditional variance $\sigma_t^2$?

Apply EQ T4.4 term by term: $\alpha\,\varepsilon_{t-1}^2 = 0.1 \times 0.0009 = 0.00009$; $\beta\,\sigma_{t-1}^2 = 0.85 \times 0.0004 = 0.00034$. Sum with $\omega$: $0.00001 + 0.00009 + 0.00034 = $ 0.00044. (Today's volatility is $\sqrt{0.00044} \approx 0.021$, i.e. about a 2.1% daily move.)

True or false: when $\alpha + \beta$ is close to 1, a shock to volatility decays slowly, so today's turbulence stays elevated for a long time. (Answer true or false.)

From EQ T4.5 the deviation $\sigma_t^2 - \bar\sigma^2$ shrinks by a factor of $(\alpha+\beta)$ each step. If $\alpha+\beta$ is near 1 that factor is near 1, so the deviation barely shrinks per day and volatility reverts to its mean only over many sessions — long memory, slow decay. The statement is true.

PYTHON · RUNNABLE IN-BROWSER

# Simulate a GARCH(1,1) process; plot returns and conditional vol
import numpy as np
rng = np.random.default_rng(1)
omega, alpha, beta = 1e-5, 0.10, 0.85          # persistence alpha+beta = 0.95
T = 600
var_lr = omega / (1 - alpha - beta)            # long-run (unconditional) variance
r   = np.zeros(T)
s2  = np.zeros(T); s2[0] = var_lr              # start at the long-run level
for t in range(1, T):
    s2[t] = omega + alpha * r[t-1]**2 + beta * s2[t-1]
    r[t]  = np.sqrt(s2[t]) * rng.standard_normal()

vol = np.sqrt(s2)
print(f"long-run daily vol : {np.sqrt(var_lr):.4f}  (annualized ~{np.sqrt(var_lr)*np.sqrt(252):.1%})")
print(f"realised daily vol : {r.std():.4f}")
print(f"max  |return|      : {np.abs(r).max():.4f}   on day {int(np.argmax(np.abs(r)))}")
print(f"corr(r, lag r)     : {np.corrcoef(r[1:], r[:-1])[0,1]:+.3f}  (near 0: level is noise)")
print(f"corr(r^2, lag r^2) : {np.corrcoef(r[1:]**2, (r[:-1])**2)[0,1]:+.3f}  (positive: magnitude has memory)")
plot_xy(list(range(T)), vol)                    # the clustering, made visible

edits are live — break it on purpose

PYTHON · RUNNABLE IN-BROWSER

# Run the GARCH(1,1) variance recursion on returns; print the one-step vol
import numpy as np
omega, alpha, beta = 1e-5, 0.10, 0.85
# a short return series ending in two violent days (a shock arriving)
r = np.array([0.004, -0.006, 0.002, -0.003, 0.005,
              -0.028, 0.031, -0.004, 0.007, -0.002])

var_lr = omega / (1 - alpha - beta)
s2 = var_lr                                     # seed at the long-run variance
print(" day   return    sigma^2        sigma (daily vol)")
for t, rt in enumerate(r):
    s2 = omega + alpha * rt**2 + beta * s2      # EQ T4.4
    print(f"  {t:2d}  {rt:+.4f}   {s2:.6e}     {np.sqrt(s2):.4%}")

# one-step-ahead forecast uses the LAST observed shock and variance
s2_next = omega + alpha * r[-1]**2 + beta * s2
print(f"\none-step-ahead sigma^2 : {s2_next:.6e}")
print(f"one-step-ahead vol     : {np.sqrt(s2_next):.4%}  (note the spike that lingers)")

edits are live — break it on purpose

INSTRUMENT T4.1 — GARCH(1,1) SIMULATOREQ T4.4 · CLUSTERING & PERSISTENCE · SEEDED

REACTION α 0.10

PERSISTENCE β 0.85

α + β (PERSISTENCE)

—

SHOCK HALF-LIFE

—

LONG-RUN DAILY VOL

—

The mint line is the conditional volatility $\sigma_t$; the faint grey bars are the simulated returns it scales. Push α up and volatility reacts violently to each shock but the spikes are jagged and short. Push β up and the spikes smooth into long plateaus — memory. When α + β crosses ~0.97 the half-life balloons and the series stops mean-reverting on any human timescale: that is the IGARCH regime where the model says "today's storm is the new normal until further notice." The same seed is reused so you compare regimes, not luck.

4.4

Asymmetry — EGARCH & GJR-GARCH

Plain GARCH has a blind spot baked into its algebra: it reacts to $\varepsilon_{t-1}^2$, and squaring throws away the sign. A −4% day and a +4% day produce identical forecasts. But equity volatility is emphatically not symmetric — bad news raises future volatility far more than equally-sized good news. This leverage effect (a falling stock raises its debt-to-equity ratio, mechanically raising risk; and falling prices trigger forced selling and fear) is one of the most reliable patterns in markets, and two extensions of GARCH were built to capture it.

GJR-GARCH (Glosten–Jagannathan–Runkle, 1993) is the minimal fix: add one term that switches on only for negative shocks.

EQ T4.6 — GJR-GARCH(1,1) $$ \sigma_t^2 = \omega + \alpha\,\varepsilon_{t-1}^2 + \gamma\, \mathbb{1}_{\{\varepsilon_{t-1} < 0\}}\,\varepsilon_{t-1}^2 + \beta\,\sigma_{t-1}^2 $$

The indicator $\mathbb{1}_{\{\varepsilon_{t-1} < 0\}}$ equals 1 after a down day and 0 otherwise, so a negative shock contributes $(\alpha+\gamma)\varepsilon_{t-1}^2$ while a positive one contributes only $\alpha\,\varepsilon_{t-1}^2$. A positive $\gamma$ is the leverage effect made into a parameter — and for equity indices $\gamma$ is reliably positive and often larger than $\alpha$ itself. Persistence becomes $\alpha + \beta + \tfrac{1}{2}\gamma$ (the $\tfrac12$ is the probability a shock is negative).

EGARCH (Nelson, 1991) takes a different route: model the log of variance, which guarantees positivity without any constraints on the signs of the coefficients, and let the news term depend on both the magnitude and the sign of the standardized shock $z_{t-1} = \varepsilon_{t-1}/\sigma_{t-1}$.

EQ T4.7 — EGARCH(1,1) $$ \log \sigma_t^2 = \omega + \beta \log \sigma_{t-1}^2 + \alpha\Big(\,|z_{t-1}| - \mathbb{E}|z_{t-1}|\,\Big) + \theta\, z_{t-1} $$

The $\alpha$ term is the symmetric magnitude response; the $\theta\, z_{t-1}$ term is the asymmetry — with $\theta < 0$, a negative $z$ (a down day) raises $\log\sigma_t^2$ more than a positive one of equal size. Because it works in logs, EGARCH needs no positivity constraints and can express richer news-impact curves, at the cost of a likelihood that is fiddlier to optimize. Forecasting multiple steps ahead is also messier than GARCH's clean linear recursion.

The news-impact curve — next period's variance plotted against this period's shock, holding $\sigma_{t-1}^2$ fixed — is the cleanest way to see the difference. Plain GARCH gives a symmetric parabola centered at zero; GJR and EGARCH tilt it, steepening the left (bad-news) arm.

INSTRUMENT T4.2 — NEWS-IMPACT CURVEGARCH vs GJR · σ²ₜ vs εₜ₋₁ · EQ T4.4 / T4.6

REACTION α 0.06

LEVERAGE γ 0.10

σ² AFTER −3% DAY

—

σ² AFTER +3% DAY

—

DOWN / UP RATIO

—

The blue parabola is symmetric GARCH — it does not care which way the market moved. The mint curve is GJR: raise the leverage γ and watch the left (loss) arm steepen while the right arm stays put, the kink at zero growing sharper. The down/up ratio is how much more a 3% loss inflates tomorrow's variance than a 3% gain — set γ = 0 and it snaps to exactly 1.0, recovering plain GARCH. For real equity indices this ratio is routinely 2 or more.

Which to use is genuinely contested. GJR is simpler, nests GARCH cleanly (test $\gamma=0$), and is easy to forecast — most practitioners reach for it first. EGARCH is more flexible and unconstrained but harder to fit and to project forward, and its log scale makes the parameters less directly interpretable. Hansen & Lunde's large 2005 horse race found that for daily equity data nothing reliably beat a plain GARCH(1,1) for forecasting, while for exchange rates the asymmetric variants helped little — a useful humility check against over-engineering.

4.5

Forecasting volatility & the VaR link

The payoff of a fitted GARCH model is a forecast of future variance, and the recursion makes multi-step forecasts almost free. The one-step forecast is just the recursion evaluated at the last observed values. For horizons beyond one, the unknown future shock $\varepsilon_{t+h-1}^2$ is replaced by its expectation, which under the model is the forecast variance itself — collapsing the whole thing to clean geometric mean reversion toward $\bar\sigma^2$:

EQ T4.8 — h-STEP VARIANCE FORECAST $$ \mathbb{E}_t\!\left[\sigma_{t+h}^2\right] \;=\; \bar{\sigma}^2 \;+\; (\alpha+\beta)^{\,h-1}\big(\sigma_{t+1}^2 - \bar{\sigma}^2\big), \qquad h = 1, 2, 3, \ldots $$

The forecast is the long-run level $\bar\sigma^2$ plus the current deviation, geometrically discounted by the persistence $(\alpha+\beta)$ per step. From a calm start it climbs toward $\bar\sigma^2$; from a panic it decays toward it — the term structure of volatility. High persistence flattens the curve (a slow approach), low persistence snaps it back fast. Aggregating to an $H$-day variance sums these: $\sum_{h=1}^{H}\mathbb{E}_t[\sigma_{t+h}^2]$, which under iid would just be $H\sigma^2$ — the famous $\sqrt{H}$ scaling, which GARCH corrects whenever you are not already at the long-run level.

This term structure is precisely what an option's implied-volatility surface tries to price, and what a risk system needs to project losses over a 1-day or 10-day horizon. The most consequential application is Value-at-Risk (VaR): the loss threshold a portfolio will not exceed with probability $1-p$ over a given horizon. Plug the GARCH conditional volatility into the quantile of the innovation distribution:

EQ T4.9 — CONDITIONAL VALUE-AT-RISK (PARAMETRIC) $$ \mathrm{VaR}_{t}^{\,p} \;=\; -\Big(\mu + z_{p}\,\sigma_{t}\Big), \qquad z_{p} = \Phi^{-1}(p) $$

$z_p$ is the lower-tail quantile of the standardized innovation ($\Phi^{-1}(0.01) \approx -2.326$ for a Gaussian 1% VaR; use the Student-t quantile for fat tails). Because $\sigma_t$ is conditional, the VaR breathes with the market — it widens automatically in turbulent clusters and tightens in calm, unlike a static historical VaR that lags the regime badly. The 10-day regulatory VaR scales by the GARCH variance forecast $\sqrt{\sum_{h=1}^{10}\mathbb{E}_t[\sigma_{t+h}^2]}$, not by a naive $\sqrt{10}\,\sigma_t$ — the difference is exactly the mean reversion of EQ T4.8.

KEY

Why a conditional VaR matters. A static VaR built on a trailing 250-day window treats March 2020 and a sleepy summer as equally likely tomorrow. It under-warns going into a crisis (the window is still full of calm days) and over-warns coming out of one (the window is still full of the crash). GARCH-based VaR reacts within a day or two because $\sigma_t$ is recomputed every step — the practical reason banks adopted conditional volatility models for capital after 1996.

INSTRUMENT T4.3 — VOLATILITY FORECAST TERM STRUCTUREEQ T4.8 · MEAN REVERSION TO σ̄ · LIVE

PERSISTENCE α+β 0.94

TODAY'S DAILY VOL σₜ₊₁ 3.5%

LONG-RUN DAILY VOL σ̄

—

10-DAY VOL (GARCH)

—

10-DAY 99% VaR

—

Long-run vol is pinned at σ̄ = 1.5%/day (≈ 24% annualized). Start above it — a panic — and the mint term-structure curve decays back toward the dashed long-run line; drag today's vol below σ̄ and it climbs. Crank persistence toward 1 and the curve flattens to a near-horizontal plateau (shocks barely revert). The 10-day VaR reads off the aggregated GARCH variance with a Gaussian 99% quantile (z ≈ 2.326) — compare it mentally to a naive √10 × σₜ₊₁ and notice how much the mean reversion matters when you start far from σ̄.

GARCH is not the last word. It assumes the variance process is driven only by past returns; realized-volatility models (HAR-RV) instead feed high-frequency intraday data straight in and routinely forecast better. Stochastic-volatility models give variance its own innovation term rather than making it a deterministic function of past shocks — more flexible, harder to estimate. And implied volatility from options markets is forward-looking in a way no return-based model can be. But for a three-parameter model you can fit in milliseconds and explain on a napkin, GARCH(1,1) remains the benchmark every richer model must beat — and frequently does not.

We have modeled the volatility of one series in isolation — but risk lives in how series move together. A portfolio's variance is a quadratic form in a whole covariance matrix, and in a crisis correlations snap toward one exactly when diversification is supposed to save you. Chapter 05 turns the dial from one dimension to many: Vector Autoregression (VAR) for the joint dynamics of several series, the cross-correlations and Granger causality they encode, and the multivariate-GARCH machinery (DCC) that lets the whole covariance matrix breathe through time.

4.R

References

Engle, R. F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica 50(4) — the original ARCH model (EQ T4.2); the work cited for Engle's 2003 Nobel Prize.
Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics 31(3) — adds the lagged-variance term, giving GARCH(1,1) (EQ T4.4), the field's workhorse.
Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica 59(2) — the EGARCH model (EQ T4.7), capturing the leverage effect in log-variance.
Glosten, L. R., Jagannathan, R. & Runkle, D. E. (1993). On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. Journal of Finance 48(5) — the GJR-GARCH asymmetric extension (EQ T4.6).
Hansen, P. R. & Lunde, A. (2005). A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1,1)?. Journal of Applied Econometrics 20(7) — the large horse race finding GARCH(1,1) hard to beat for equities.
Engle, R. F. (2002). Dynamic Conditional Correlation: A Simple Class of Multivariate GARCH Models. Journal of Business & Economic Statistics 20(3) — the DCC model bridging to the multivariate chapter.
Mandelbrot, B. (1963). The Variation of Certain Speculative Prices. Journal of Business 36(4) — the first clear statement of volatility clustering and fat tails (EQ T4.1).