Stochastic Processes — Brownian Motion & Itô

1.1

Random walks & martingales

Start in discrete time. A drunkard stands on the integers and at each tick flips a fair coin: heads, step right; tails, step left. After $n$ steps his position is a sum of independent $\pm 1$ increments — a simple random walk. He has no memory and no destination; the best forecast of where he will be next is exactly where he is now. That property has a name, and it is the spine of mathematical finance.

A process $M_t$ is a martingale if its expected future value, given everything known so far, equals its present value:

EQ Q1.1 — THE MARTINGALE PROPERTY $$ \mathbb{E}\!\big[\,M_{t+s} \mid \mathcal{F}_t\,\big] \;=\; M_t \qquad \text{for all } s \ge 0 $$

$\mathcal{F}_t$ is the filtration — the entire history available up to time $t$. A martingale is the formal version of a fair game: no strategy using only past information can tilt the expected outcome. The fair random walk is a martingale; so, under the right probability measure, is a discounted asset price — which is exactly why §Q3.1 can price an option as a discounted expectation. A submartingale drifts up, a supermartingale drifts down.

The increments of the walk are independent and identically distributed with mean zero and unit variance, so after $n$ steps the position has mean $0$ and variance $n$ — variance accumulates linearly in the number of steps, and the typical distance travelled grows like $\sqrt{n}$. This is the diffusive scaling that will reappear, unchanged, in continuous time: spread widens with the square root of elapsed time, never linearly. A walk that moved linearly with $n$ would be a deterministic trend with noise; the $\sqrt{n}$ law is the signature of pure randomness with no edge.

Now refine the clock. Take $n$ steps in a fixed window of length $t$, each of size $\sqrt{t/n}$ so that variance is conserved at $t$ regardless of $n$. Let $n \to \infty$. By the central limit theorem the rescaled walk converges to a continuous, Gaussian-distributed process — Brownian motion. The drunkard's discrete stagger becomes a path that is continuous everywhere yet jagged at every magnification, and the whole apparatus of §1.2 onward is the price we pay for that limit.

A simple random walk takes steps that are independent, mean $0$, variance $1$. After $n = 9$ steps, what is the standard deviation of its position? (Variance is $n$; take $\sqrt{n}$.)

Independent increments make variances add: $\mathrm{Var} = n = 9$. The standard deviation is $\sqrt{9} = $ 3. Spread grows like $\sqrt{n}$, not $n$ — the diffusive law that survives into continuous time.

PYTHON · RUNNABLE IN-BROWSER

# A fair random walk converges to Brownian motion: variance grows like n
import numpy as np
rng = np.random.default_rng(0)

n_paths, n_steps = 20000, 400
steps = rng.choice([-1.0, 1.0], size=(n_paths, n_steps))   # fair +/-1 coin flips
walk  = np.cumsum(steps, axis=1)                           # positions over time

ks = [25, 100, 400]
print(" step n    sample Var    theory (= n)")
for k in ks:
    print(f" {k:5d} {walk[:, k-1].var():12.2f} {k:14d}")

# fair game: E[next | now] = now, so increments have zero conditional mean
incr = walk[:, 1:] - walk[:, :-1]
print(f"\nmean increment (should be ~0): {incr.mean():+.4f}")
print("variance = number of steps  ->  std grows like sqrt(n): the martingale walk")
plot_xy(list(range(1, n_steps + 1)), walk[:8].T.tolist())  # 8 sample paths

edits are live — break it on purpose

1.2

Brownian motion (the Wiener process)

The limit of the rescaled walk is standard Brownian motion $W_t$, also called the Wiener process after Norbert Wiener, who first proved it exists as a rigorous mathematical object. It is defined by four axioms — every later equation in this volume is a consequence of them:

EQ Q1.2 — DEFINING AXIOMS OF $W_t$ $$ W_0 = 0; \quad W_t - W_s \sim \mathcal{N}(0,\, t - s); \quad \text{increments on disjoint intervals are independent}; \quad t \mapsto W_t \text{ is continuous.} $$

Increments are Gaussian with variance equal to the elapsed time, memoryless, and the path never jumps. Two immediate consequences: $\mathbb{E}[W_t] = 0$ and $\mathrm{Var}(W_t) = t$ — the variance is the clock. $W_t$ is itself a martingale (EQ Q1.1), the continuous-time fair game. Adding a slope and a scale gives Brownian motion with drift: $X_t = x_0 + \mu t + \sigma W_t$, where $\mu$ tilts the average path and $\sigma$ sets how wide the fan of outcomes opens.

Two facts about $W_t$ defy ordinary intuition, and both matter for what follows. First, the path is continuous but nowhere differentiable: there is no well-defined velocity at any instant, because over a tiny interval $\mathrm{d}t$ the displacement is of order $\sqrt{\mathrm{d}t}$, so the ratio $\mathrm{d}W/\mathrm{d}t$ behaves like $1/\sqrt{\mathrm{d}t} \to \infty$. You cannot zoom in until the wiggling smooths out — magnify any segment and it looks statistically identical to the whole, a self-similar fractal.

Second, and decisively, is the quadratic variation. Sum the squared increments of $W$ over a partition of $[0,t]$; unlike a smooth curve, whose squared increments vanish in the limit, the sum converges to $t$ itself — not to zero, and with no randomness left over:

EQ Q1.3 — QUADRATIC VARIATION: $(\mathrm{d}W)^2 = \mathrm{d}t$ $$ \sum_{i} \big(W_{t_{i+1}} - W_{t_i}\big)^2 \;\xrightarrow{\;\Delta t \to 0\;}\; t \qquad\Longleftrightarrow\qquad (\mathrm{d}W_t)^2 = \mathrm{d}t,\;\; (\mathrm{d}t)^2 = 0,\;\; \mathrm{d}W_t\,\mathrm{d}t = 0 $$

This is the single most important line in the chapter. For a smooth function $\sum (\Delta f)^2 \to 0$, so calculus can ignore second-order terms. For Brownian motion it does not vanish: it accumulates deterministically at rate $\mathrm{d}t$. The mnemonic $(\mathrm{d}W)^2 = \mathrm{d}t$ is exactly the term ordinary calculus throws away — and keeping it is what produces Itô's lemma (§1.4) and the $\tfrac12\sigma^2$ corrections that haunt the rest of this volume.

One honest caveat. Real prices are not literally Brownian. They jump on news, their volatility clusters and spikes, and a strictly Brownian model gives prices that can go negative — which is why §1.3 works with the logarithm instead. Bachelier's 1900 thesis modelled prices as arithmetic Brownian motion and was decades ahead of its time; the modern repair, geometric Brownian motion, keeps the tractability while fixing the sign. Brownian motion is the idealization that makes the algebra possible, not a faithful portrait of a tape.

For standard Brownian motion ($\sigma = 1$), what is the variance of $W_t$ at time $t = 4$? (Use $\mathrm{Var}(W_t) = \sigma^2 t$.)

With $\sigma = 1$, $\mathrm{Var}(W_t) = \sigma^2 t = 1 \cdot t = t$. At $t = 4$ the variance is $1 \times 4 = $ 4. The variance of Brownian motion is just the elapsed time — the clock measured in spread-squared.

INSTRUMENT Q1.1 — BROWNIAN PATH GENERATORDRIFT + DIFFUSION · EQ Q1.2 · MANY PATHS

DRIFT μ 0.00

VOL σ 0.30

PATHS 40

MEAN AT T=1 (μ)

—

STD AT T=1 (σ√T)

—

±1σ ENVELOPE

—

Every grey line is one path of $X_t = \mu t + \sigma W_t$ over $[0,1]$; the mint line is the mean $\mu t$ and the dashed envelope is $\pm\sigma\sqrt{t}$. Set σ small and the paths hug the drift; widen it and the fan opens like the mouth of a trumpet — note it opens as $\sqrt{t}$, not $t$, the diffusive law from §1.1. Drag drift to zero to see a pure martingale: outcomes spread symmetrically around the start with no preferred direction.

1.3

Geometric Brownian motion — the stock model

A stock cannot follow plain Brownian motion: prices would wander below zero, and a $5 stock and a $500 stock would feel the same dollar shocks rather than the same percentage shocks. The fix is to put the randomness on the returns, not the price level. The instantaneous return earns a drift $\mu$ plus a noise $\sigma\,\mathrm{d}W_t$:

EQ Q1.4 — GEOMETRIC BROWNIAN MOTION (SDE) $$ \frac{\mathrm{d}S_t}{S_t} = \mu\,\mathrm{d}t + \sigma\,\mathrm{d}W_t \qquad\Longleftrightarrow\qquad \mathrm{d}S_t = \mu\,S_t\,\mathrm{d}t + \sigma\,S_t\,\mathrm{d}W_t $$

$\mu$ is the expected return per unit time, $\sigma$ the volatility. Because the noise scales with $S_t$, the price cannot reach zero in finite time and shocks act multiplicatively — exactly how real returns behave. This is the model under Black–Scholes (Vol · EQ Q3.2); the same equation reappears there with $\mu$ replaced by the riskless rate $r$ under the risk-neutral measure.

Solving EQ Q1.4 requires applying calculus to $\log S_t$, which is precisely where Itô's lemma enters (§1.4). The result — derived in full next section — is the closed-form solution:

EQ Q1.5 — GBM CLOSED FORM & ITS LOGNORMAL LAW $$ S_T = S_0 \exp\!\Big[\big(\mu - \tfrac{1}{2}\sigma^2\big)T + \sigma W_T\Big], \qquad \ln\!\frac{S_T}{S_0} \sim \mathcal{N}\!\big((\mu - \tfrac{1}{2}\sigma^2)T,\; \sigma^2 T\big) $$

The log-price is Brownian motion with drift — Gaussian — so the price itself is lognormal: never negative, right-skewed. Note the drift of the log is $\mu - \tfrac12\sigma^2$, not $\mu$. That missing $\tfrac12\sigma^2$ is the Itô correction, sometimes called volatility drag: the median compounded return falls below the mean by half the variance. It is why a volatile asset with positive expected return can still have a typical (median) path that loses money.

The two moments of $S_T$ follow from the lognormal law and are worth memorizing because every Monte-Carlo check reduces to them:

EQ Q1.6 — TERMINAL MEAN AND VARIANCE $$ \mathbb{E}[S_T] = S_0\,e^{\mu T}, \qquad \mathrm{Var}(S_T) = S_0^2\,e^{2\mu T}\big(e^{\sigma^2 T} - 1\big) $$

The mean grows at the full rate $\mu$ — the $\tfrac12\sigma^2$ correction is hidden in the difference between mean and median. As $\sigma \to 0$ the variance vanishes and the price becomes the deterministic compounding $S_0 e^{\mu T}$. Mean and median diverge as volatility rises: the mean is dragged up by the fat right tail of the lognormal, while the typical realized path sits near the lower median.

A stock follows GBM with volatility $\sigma = 0.2$. By how much does the log-drift fall below the arithmetic drift $\mu$ — i.e. what is the Itô correction $\tfrac12\sigma^2$?

The correction is $\tfrac{1}{2}\sigma^2 = \tfrac{1}{2}(0.2)^2 = \tfrac{1}{2}(0.04) = $ 0.02. So $\ln(S_T/S_0)$ drifts at $\mu - 0.02$ per year — two percentage points of volatility drag from a 20% vol.

INSTRUMENT Q1.2 — GBM SIMULATOR + LOGNORMAL TERMINALEQ Q1.5 · PATHS → HISTOGRAM OF $S_T$

DRIFT μ 0.08

VOL σ 0.25

HORIZON T (yrs) 1.00

E[Sₜ] = S₀eᵘᵀ

—

MEDIAN = S₀e⁽ᵘ⁻½σ²⁾ᵀ

—

DRAG ½σ²T

—

Left: a fan of GBM price paths from $S_0 = 100$. Right: the histogram of terminal prices $S_T$ — visibly right-skewed, the lognormal shape. The mint marker is the mean $S_0 e^{\mu T}$; the blue marker is the median $S_0 e^{(\mu - \frac12\sigma^2)T}$. Crank σ up and watch the two markers pull apart — the gap is the volatility drag, mean to the right, median dragged left.

PYTHON · RUNNABLE IN-BROWSER

# Simulate Brownian & GBM paths; terminal mean/var vs theory (EQ Q1.5-Q1.6)
import numpy as np
rng = np.random.default_rng(1)

S0, mu, sig, T = 100.0, 0.08, 0.25, 1.0
n_paths, n_steps = 100000, 250
dt = T / n_steps

dW = rng.normal(0.0, np.sqrt(dt), size=(n_paths, n_steps))   # Brownian increments
W_T = dW.sum(axis=1)                                          # W_T ~ N(0, T)
print(f"Brownian W_T : mean {W_T.mean():+.4f} (theory 0)   var {W_T.var():.4f} (theory {T:.4f})")

# closed-form GBM terminal: log-drift carries the -1/2 sigma^2 correction
S_T = S0 * np.exp((mu - 0.5*sig**2)*T + sig*W_T)
mean_thy = S0*np.exp(mu*T)
var_thy  = S0**2*np.exp(2*mu*T)*(np.exp(sig**2*T) - 1)
print(f"GBM  E[S_T]  : sim {S_T.mean():8.3f}   theory {mean_thy:8.3f}")
print(f"GBM  Var(S_T): sim {S_T.var():8.2f}   theory {var_thy:8.2f}")
print(f"median S_T   : sim {np.median(S_T):8.3f}   theory {S0*np.exp((mu-0.5*sig**2)*T):8.3f}")
print("\nmean > median: the lognormal right tail drags the average up.")
plot_xy(sorted(S_T[:5000].tolist()), np.linspace(0,1,5000).tolist())  # empirical CDF

edits are live — break it on purpose

1.4

Itô's lemma

Here is the heart of the chapter. In ordinary calculus, the chain rule for a function $f(x)$ of a smooth path keeps only the first derivative: $\mathrm{d}f = f'(x)\,\mathrm{d}x$. The second-order term $\tfrac12 f''(x)(\mathrm{d}x)^2$ is discarded because $(\mathrm{d}x)^2$ is negligible for a smooth curve. For a Brownian path it is not negligible — EQ Q1.3 says $(\mathrm{d}W)^2 = \mathrm{d}t$ — so that second-order term refuses to die. Keeping it gives Itô's lemma, the chain rule of stochastic calculus, proved by Kiyosi Itô in 1944:

EQ Q1.7 — ITÔ'S LEMMA $$ \mathrm{d}f(t, X_t) = \underbrace{\Big(\frac{\partial f}{\partial t} + \mu\,\frac{\partial f}{\partial x} + \tfrac{1}{2}\sigma^2\,\frac{\partial^2 f}{\partial x^2}\Big)\mathrm{d}t}_{\text{drift}} + \underbrace{\sigma\,\frac{\partial f}{\partial x}\,\mathrm{d}W_t}_{\text{diffusion}}, \qquad \mathrm{d}X_t = \mu\,\mathrm{d}t + \sigma\,\mathrm{d}W_t $$

Everything matches the ordinary chain rule except the boxed $\tfrac12\sigma^2\,\partial^2 f/\partial x^2$ term. It is the trace left by quadratic variation: when you Taylor-expand $f$ to second order, the $(\mathrm{d}X)^2$ term contributes $\sigma^2(\mathrm{d}W)^2 = \sigma^2\,\mathrm{d}t$, which is first-order in time, not negligible. Randomness adds a deterministic drift to any nonlinear function of a random path. Convex $f$ ($f'' > 0$) gets an upward kick; concave $f$ (like $\log$) gets dragged down — that is volatility drag.

The cleanest demonstration that ordinary calculus fails uses $f(W) = W^2$. Naïve calculus would write $\mathrm{d}(W^2) = 2W\,\mathrm{d}W$. Itô's lemma, with $\mu = 0,\ \sigma = 1$, $f' = 2W$, $f'' = 2$, instead gives:

EQ Q1.8 — THE EXTRA $\mathrm{d}t$ MADE VISIBLE $$ \mathrm{d}(W_t^2) = 2W_t\,\mathrm{d}W_t + \mathrm{d}t \qquad\Longrightarrow\qquad \mathbb{E}[W_t^2] = t $$

The lone $+\,\mathrm{d}t$ is the entire difference between the two calculi. Integrate the diffusion term $2W\,\mathrm{d}W$ — it is a martingale, expectation zero — so taking expectations leaves $\mathbb{E}[W_t^2] = t$, which is just $\mathrm{Var}(W_t) = t$ recovered the long way. An ordinary-calculus answer of $W_t^2$ would predict $\mathbb{E} = 0$; reality is $t$. That gap, summed across a portfolio, is what a hedging desk actually trades (Vol · §Q3.4).

Using Itô's lemma for $f(W) = W^2$ (EQ Q1.8), what is $\mathbb{E}[W_t^2]$ at time $t = 9$?

Taking expectations of $\mathrm{d}(W^2) = 2W\,\mathrm{d}W + \mathrm{d}t$, the martingale term vanishes, leaving $\mathbb{E}[W_t^2] = t$. At $t = 9$ that is 9 — equal to $\mathrm{Var}(W_9)$, since $\mathbb{E}[W_9] = 0$. Naïve calculus would have wrongly said $0$.

INSTRUMENT Q1.3 — ITÔ vs ORDINARY CALCULUSTHE ½σ² CORRECTION · EQ Q1.7

VOL σ 0.40

FUNCTION f

ORDINARY (no ½σ²)

—

ITÔ (with correction)

—

SIMULATED MEAN

—

For f = ln S (GBM, μ = 0): ordinary calculus predicts $\mathbb{E}[\ln S_T/S_0] = 0$; Itô predicts $-\tfrac12\sigma^2 T$, and the simulated mean (10,000 paths, seeded) lands on the Itô value. For f = W²: ordinary calculus implies mean $0$; Itô gives $t$. The grey histogram is the simulated distribution of the function's increment; the two vertical markers are the two theories — only the mint (Itô) one hits the data. Slide σ and watch ordinary calculus fall further behind.

1.5

Stochastic differential equations

A stochastic differential equation (SDE) is the master template of every model in this volume. It writes the infinitesimal change of a process as a deterministic drift plus a random diffusion:

EQ Q1.9 — THE GENERAL ITÔ SDE $$ \mathrm{d}X_t = \underbrace{a(X_t, t)\,\mathrm{d}t}_{\text{drift}} + \underbrace{b(X_t, t)\,\mathrm{d}W_t}_{\text{diffusion}} $$

$a(\cdot)$ is the local mean rate, $b(\cdot)$ the local volatility. Choosing the two functions is choosing the model. Constant $a,b$: Brownian motion with drift (§1.2). $a = \mu x,\ b = \sigma x$: geometric Brownian motion (§1.3). $a = \kappa(\theta - x),\ b = \sigma$: the mean-reverting Ornstein–Uhlenbeck / Vasicek process. The same Itô machinery (§1.4) integrates them all.

Most SDEs have no closed-form solution, so they are solved numerically. The simplest scheme is Euler–Maruyama: step forward by $\Delta t$, adding a drift increment and a Gaussian noise of standard deviation $b\sqrt{\Delta t}$ — note the $\sqrt{\Delta t}$, the diffusive scaling of §1.1 made into code:

EQ Q1.10 — EULER–MARUYAMA STEP $$ X_{t+\Delta t} = X_t + a(X_t, t)\,\Delta t + b(X_t, t)\,\sqrt{\Delta t}\;Z, \qquad Z \sim \mathcal{N}(0, 1) $$

The noise term carries $\sqrt{\Delta t}$, not $\Delta t$ — discretizing $\mathrm{d}W$, whose standard deviation over a step of length $\Delta t$ is $\sqrt{\Delta t}$. Getting that exponent wrong is the single most common bug in a Monte-Carlo engine. Euler–Maruyama has strong convergence order $\tfrac12$; the Milstein scheme adds a correction term to reach order $1$. For GBM specifically, prefer the exact log-update $S_{t+\Delta t} = S_t \exp[(\mu - \tfrac12\sigma^2)\Delta t + \sigma\sqrt{\Delta t}\,Z]$, which has no discretization error at all.

One mean-reverting SDE deserves a name now, because it powers the interest-rate models of Quant 04 and the volatility models of Quant 03. The Ornstein–Uhlenbeck process $\mathrm{d}X_t = \kappa(\theta - X_t)\,\mathrm{d}t + \sigma\,\mathrm{d}W_t$ is pulled back toward a long-run level $\theta$ at speed $\kappa$: unlike Brownian motion, whose variance grows without bound, its variance saturates at $\sigma^2/2\kappa$. It is the canonical model for anything that wanders but does not run away — spreads, rates, mean-reverting pairs.

Where the idealization leaks. Itô calculus assumes continuous paths and finite quadratic variation. Markets violate both: prices jump (the 1987 crash was a discontinuity no diffusion can produce), and tails are fatter than Gaussian. The honest repairs are jump-diffusions (a Poisson jump term added to EQ Q1.9), stochastic volatility (make $b$ itself an SDE, as Heston does in Vol · EQ Q3.7), and rough-volatility models that replace $W$ with fractional Brownian motion. None of these abandon Itô; they extend it. Brownian motion remains the load-bearing baseline precisely because its algebra is the one that closes.

PYTHON · RUNNABLE IN-BROWSER

# Verify the Ito drift correction: E[log S_T] uses mu - 0.5*sigma^2, not mu
import numpy as np
rng = np.random.default_rng(2)

S0, mu, sig, T = 100.0, 0.10, 0.40, 1.0
n_paths, n_steps = 200000, 200
dt = T / n_steps

# Euler-Maruyama on GBM:  dS = mu*S dt + sig*S dW,  noise scales with sqrt(dt)
S = np.full(n_paths, S0)
for _ in range(n_steps):
    Z = rng.standard_normal(n_paths)
    S += mu*S*dt + sig*S*np.sqrt(dt)*Z

log_ret = np.log(S / S0)
ito_drift   = (mu - 0.5*sig**2) * T     # correct: Ito's lemma
naive_drift =  mu * T                    # wrong: ordinary calculus

print(f"simulated  E[log S_T/S0] : {log_ret.mean():+.4f}")
print(f"Ito  prediction (mu-.5s^2)T: {ito_drift:+.4f}   <- matches")
print(f"naive prediction  mu*T    : {naive_drift:+.4f}   <- off by 0.5*sig^2*T = {0.5*sig**2*T:.4f}")
print(f"\nmean price E[S_T] still grows at mu: sim {S.mean():.2f} vs S0*e^(muT) {S0*np.exp(mu*T):.2f}")
print("the half-sigma-squared lives in the LOG drift, not the price mean.")

edits are live — break it on purpose

You now have the engine; next we discretize it into something you can price on. Quant 02 collapses continuous GBM onto a recombining binomial tree: choose up/down moves and a risk-neutral probability so the tree's mean and variance match the SDE, then price any option by backward induction. It is Itô made arithmetic — and the most intuitive door into the Black–Scholes formula of Quant 03.

1.R

References

Itô, K. (1944). Stochastic Integral. Proc. Imperial Acad. Tokyo 20(8) — the original construction of the Itô integral and the lemma of §1.4.
Bachelier, L. (1900). Théorie de la spéculation. Ann. Sci. ÉNS 17 — the founding thesis modelling prices as Brownian motion, five years before Einstein.
Øksendal, B. (2003). Stochastic Differential Equations: An Introduction with Applications (6th ed.). Springer — the standard graduate text on Itô calculus and SDEs.
Uhlenbeck, G. E. & Ornstein, L. S. (1930). On the Theory of the Brownian Motion. Phys. Rev. 36 — the mean-reverting process of §1.5.
Einstein, A. (1905). Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung…. Ann. Phys. 322 — the physical derivation that variance grows linearly in time.
Wiener, N. (1923). Differential Space. J. Math. Phys. 2 — the rigorous existence proof of the process that bears his name.