Trend, seasonality & noise
A time series is a sequence of observations indexed by time, \(y_1, y_2, \ldots, y_T\), where the index is not a label but a coordinate: \(y_t\) and \(y_{t+1}\) are neighbours, and that adjacency carries information. The first reflex of the field is to read the series as a sum of structured parts plus what is left over. The classical decomposition is additive:
This split is descriptive, not causal: it is a lens, and choosing additive versus multiplicative, or the seasonal period \(m\), is a modelling decision you make by looking. The remainder \(R_t\) is the part we actually want to be boring. If \(R_t\) still wiggles in a predictable way — if knowing \(R_{t-1}\) helps you guess \(R_t\) — then the decomposition left structure on the table, and the chapters that follow (ARIMA, ETS, GARCH) exist to mop it up.
A note on honesty. The classical additive split assumes the trend is smooth and the season has a fixed period and shape. Real series violate both — holidays move, regimes shift, the period itself drifts. Robust modern decompositions (STL, the loess-based method) allow the seasonal shape to evolve and resist outliers; treat any decomposition as a hypothesis to check, not a fact to trust.
Stationarity & why it matters
Here is the assumption almost every classical model needs, and the one a clock loves to break. A series is (weakly) stationary if its statistical character does not depend on when you look at it. Concretely, three things must hold for all \(t\) and all lags \(k\):
Why is this the load-bearing assumption? Independent-and-identically-distributed (i.i.d.) data is the comfortable world of the rest of this encyclopedia: each row drawn fresh from one fixed distribution, so a sample average converges to the truth and a single split estimates generalization (the holdout logic of MLOPS · §1.1). A time series is emphatically not i.i.d. — the points are dependent by construction. Stationarity is the weaker substitute: it does not require independence, only that the dependence structure be stable over time. That stability is enough to make estimation and forecasting well-posed.
| Series | Violates | Stationary? | Fix (§1.5) |
|---|---|---|---|
| Linear upward trend | constant mean | no | difference once |
| Variance grows with level | constant variance | no | log / Box–Cox |
| Seasonal sales | const. mean & \(\gamma_k\) | no | seasonal difference |
| White noise | — nothing — | yes | already there |
| Stable AR(1), \(|\phi|<1\) | — nothing — | yes | already there |
Strict vs weak. The definition above is weak (second-order) stationarity — it constrains only the first two moments. Strict stationarity asks that the entire joint distribution be time-invariant, a much stronger demand. For Gaussian processes the two coincide, which is why the weak form is the working definition in practice. Most of forecasting lives on the assumption that, after some transform, the series is weakly stationary.
Autocorrelation — ACF & PACF
If the points are dependent, the natural question is: how dependent, and at what range? The autocorrelation function (ACF) answers it by correlating the series with a delayed copy of itself. At lag \(k\) it is the covariance \(\gamma_k\) from EQ T1.2, normalized by the variance so it lives in \([-1, +1]\):
The ACF has a blind spot. If today depends on yesterday, then today also correlates with the day before — not directly, but through yesterday. The ACF cannot tell a direct link from a relayed one. The partial autocorrelation function (PACF) closes that gap: \(\alpha_k\) is the correlation between \(y_t\) and \(y_{t-k}\) after removing the linear effect of all the lags in between. It is the direct dependence at range \(k\), with the relayed paths stripped out.
For the workhorse AR(1) process \(y_t = \phi\, y_{t-1} + \varepsilon_t\), the theory is exact and worth memorizing: the ACF is a clean geometric decay, \(\rho_k = \phi^k\), and the PACF is a single spike of height \(\phi\) at lag 1 and exactly zero everywhere after. That pair — exponential ACF, one-spike PACF — is the textbook AR(1) signature, and it is what the next instrument lets you see.
# Simulate AR(1), then compute and plot its sample ACF (EQ T1.3).
import numpy as np
rng = np.random.default_rng(0)
phi, T = 0.7, 600
eps = rng.normal(0, 1, T)
y = np.zeros(T)
for t in range(1, T): # y_t = phi * y_{t-1} + eps_t
y[t] = phi * y[t-1] + eps[t]
y = y - y.mean() # center so the ACF formula is clean
def acf(x, K): # sample autocorrelation up to lag K
denom = np.sum(x * x)
return np.array([np.sum(x[:len(x)-k] * x[k:]) / denom for k in range(K+1)])
K = 12
r = acf(y, K)
band = 1.96 / np.sqrt(T) # +/- white-noise significance band
print(" lag sample ACF theory phi^k")
for k in range(K+1):
flag = " *" if abs(r[k]) > band and k > 0 else ""
print(f" {k:3d} {r[k]:8.3f} {phi**k:8.3f}{flag}")
print(f"\nwhite-noise band +/-{band:.3f}; bars marked * are real memory.")
print("note the sample ACF tracks the geometric phi^k decay of an AR(1).")
plot_xy(list(range(K+1)), list(r))
White noise & the random walk
Two reference processes anchor the whole subject — one the picture of "no structure," the other the most important non-stationary series in practice. White noise is the boring ideal: a sequence of uncorrelated, zero-mean, constant-variance shocks. It is stationary by construction and, crucially, unforecastable beyond its mean.
Now cumulate that noise. A random walk sets each value equal to the previous one plus a fresh independent shock — it is the running sum of white noise, and it is the canonical model for an unpredictable price, a diffusing particle, or any quantity that wanders without an anchor:
The contested part, stated plainly. Whether a given real series — GDP, a stock index, an exchange rate — is "trend-stationary" (a deterministic trend plus stationary noise) or "difference-stationary" (a random walk with drift) is genuinely hard to decide from finite data, and decades of econometrics have been spent arguing specific cases. The two imply very different forecasts and very different long-run behaviour. Unit-root tests give evidence, not certainty; honest practice reports the ambiguity rather than hiding it.
Differencing & transforms to stationarity
So a great many real series are not stationary — and the entire toolkit needs them to be. The fix is a pair of cheap, reversible transforms that attack the two ways stationarity fails: a non-constant mean, and a non-constant variance.
The mean problem — trend — is killed by differencing: replace the series with the step-to-step changes. Define the difference operator \(\nabla y_t = y_t - y_{t-1}\). One difference removes a linear trend; a second difference removes a quadratic one. The payoff is exact for the random walk:
The variance problem — swings that widen as the series grows — is killed by a variance-stabilizing transform. The log is the everyday choice; the Box–Cox family generalizes it with a single tunable power \(\lambda\), smoothly spanning from "no transform" (\(\lambda = 1\)) through "square root" (\(\lambda = 0.5\)) to "log" (\(\lambda \to 0\)):
# Difference a trending series and watch the variance collapse (EQ T1.7).
import numpy as np
rng = np.random.default_rng(1)
T = 400
trend = 0.5 * np.arange(T) # a steady linear climb: non-stationary mean
y = trend + np.cumsum(rng.normal(0, 1, T)) # trend + a random-walk wander on top
d1 = np.diff(y) # first difference: nabla y_t = y_t - y_{t-1}
d2 = np.diff(d1) # second difference
def stats(name, x):
print(f"{name:18s} mean {x.mean():8.3f} variance {x.var():12.1f}")
print("level vs differenced series:")
stats("y (level)", y) # huge variance: the trend dominates
stats("diff once (d=1)", d1) # variance plummets; mean ~ the slope 0.5
stats("diff twice (d=2)", d2) # flat mean ~0; over-differenced -> var rises again
print("\none difference removes the trend (mean -> the slope, variance collapses);")
print("a SECOND difference over-does it -- variance climbs back. Difference sparingly.")
plot_xy(list(range(len(d1))), list(d1)) # the stationary-looking differenced series
You now have the vocabulary; ARIMA gives it grammar. Once a series is stationary — variance-stabilized, then differenced \(d\) times — its leftover memory is exactly the AR and MA structure the correlograms revealed. Chapter 02 fuses the three letters: the Integration order \(d\) from this chapter, the AutoRegression and Moving Average orders \(p\) and \(q\) read off the ACF and PACF, into the single most-used forecasting model in the world.
References
- Box, G. E. P., Jenkins, G. M., Reinsel, G. C. & Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control (5th ed.).
- Hamilton, J. D. (1994). Time Series Analysis.
- Hyndman, R. J. & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.).
- Dickey, D. A. & Fuller, W. A. (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root.
- Box, G. E. P. & Cox, D. R. (1964). An Analysis of Transformations.
- Ljung, G. M. & Box, G. E. P. (1978). On a Measure of Lack of Fit in Time Series Models.
- Yule, G. U. (1927). On a Method of Investigating Periodicities in Disturbed Series.