Simple exponential smoothing
Start with a series that has no trend and no season — just a level that wanders, buried in noise. A naïve forecast uses only the last value; a long moving average uses many values but weights them all equally, which is plainly wrong: a reading from a year ago should not count as much as yesterday's. Simple exponential smoothing (SES) resolves the tension with one parameter. Maintain a running estimate of the level \(\ell_t\) and, at every new observation, nudge it toward the latest value by a fraction \(\alpha\):
The error-correction form makes the "learning rate" reading explicit. Rearranging EQ T3.1 around the one-step forecast error \(e_t = y_t - \ell_{t-1}\):
Why "exponential"? Unrolling the recurrence shows the forecast is a weighted average of all past observations, with weights that decay geometrically into the past:
# Simple exponential smoothing in numpy: fit a level, print fitted vs actual
import numpy as np
rng = np.random.default_rng(0)
# a wandering level (random walk) plus observation noise -- no trend, no season
n = 24
level_true = 50 + np.cumsum(rng.normal(0, 1.2, n))
y = level_true + rng.normal(0, 2.0, n)
alpha = 0.3
ell = y[0] # initialise the level at the first observation
fitted = np.empty(n)
fitted[0] = ell
for t in range(1, n):
fitted[t] = ell # one-step forecast BEFORE seeing y[t] is the old level
ell = alpha * y[t] + (1 - alpha) * ell # EQ T3.1 update
sse = np.sum((y[1:] - fitted[1:]) ** 2)
print(f"alpha = {alpha}, one-step SSE = {sse:.2f}, final level = {ell:.2f}")
print(" t actual forecast error")
for t in range(1, 8):
print(f"{t:2d} {y[t]:7.2f} {fitted[t]:8.2f} {y[t]-fitted[t]:+7.2f}")
plot_xy(list(range(n)), list(y)) # the noisy series; fitted line tracks its level
Holt's linear trend method
SES forecasts a flat line, so it lags badly on any series that is climbing or falling: it is forever chasing a level that has already moved on. Holt (1957) added a second smoothed component — a trend \(b_t\), the estimated change per period — updated by its own smoothing parameter \(\beta\). Now two recurrences run in lockstep, and the forecast extrapolates the trend forward:
One honest caveat: a linear trend projected far into the future is usually too aggressive — real series flatten. The standard fix is the damped trend of Gardner & McKenzie (1985), which multiplies the trend by a damping factor \(0 < \phi < 1\) so the forecast bends toward a horizontal asymptote:
# Holt's linear method: vary alpha and beta, forecast h steps ahead (EQ T3.4)
import numpy as np
# a trending series: level rises ~1.5/period with a little noise
n = 30
y = 10 + 1.5 * np.arange(n) + np.array([0,1,-1,2,0,-2,1,3,-1,0,
2,-1,1,0,-2,1,2,-1,0,1,
-1,2,0,1,-2,0,1,-1,2,0], float)
def holt(y, alpha, beta, h=4):
ell, b = y[0], y[1] - y[0] # init: level=y0, trend=first difference
for t in range(1, len(y)):
prev = ell
ell = alpha * y[t] + (1 - alpha) * (ell + b) # level
b = beta * (ell - prev) + (1 - beta) * b # trend
fc = [ell + (i + 1) * b for i in range(h)] # straight-line forecast
return ell, b, fc
print(" alpha beta | final level trend 4-step forecast")
for alpha, beta in [(0.8, 0.2), (0.5, 0.1), (0.3, 0.05)]:
ell, b, fc = holt(y, alpha, beta)
print(f" {alpha:4.2f} {beta:4.2f} | {ell:9.2f} {b:6.3f} "
+ " ".join(f"{v:6.1f}" for v in fc))
print("\nhigher beta -> trend reacts faster to slope changes (and to noise).")
plot_xy(list(range(n)), list(y))
A naming map for the confused. SES is "single" smoothing; Holt is "double"; Holt-Winters (next) is "triple". The labels just count how many recurrences run — one per component you choose to track: level, then trend, then season.
Holt-Winters seasonal method
Most operational series breathe on a calendar: weekly retail, daily electricity, monthly tourism. Winters (1960) completed Holt's method by adding a third smoothed component — a vector of \(m\) seasonal indices \(s_t\) (one per position in the cycle, \(m=12\) for monthly, \(m=7\) for daily-of-week), each updated by its own parameter \(\gamma\). The result, Holt-Winters, smooths level, trend, and season simultaneously. There are two flavours, depending on whether seasonal swings are a fixed amount or a fixed fraction of the level.
The ETS state-space framework
For forty years exponential smoothing was a bag of recurrences with no probability model behind them — you could forecast, but you could not say how uncertain the forecast was, nor choose a method by a principled criterion. Hyndman, Koehler, Ord & Snyder (2002, 2008) fixed that by showing every smoothing method is the point forecast of an underlying state-space model with a single source of error. This is the ETS family: Error · Trend · Season.
ETS classifies a model by a three-letter code: Error ∈ {A, M}, Trend ∈ {N, A, Ad}, Season ∈ {N, A, M}. So ETS(A,N,N) is SES with additive noise, ETS(A,A,N) is Holt, ETS(A,A,A) is additive Holt-Winters, and ETS(M,A,M) is the multiplicative-error airline model. There are 30 admissible combinations; the practical recipe is to let software fit all of them and pick by AIC.
| Method | Components (E,T,S) | ETS code | Forecast shape |
|---|---|---|---|
| SES | level only | (A,N,N) | flat line |
| Holt | level + trend | (A,A,N) | straight line |
| Damped Holt | level + damped trend | (A,Ad,N) | bends to asymptote |
| Additive HW | level + trend + season | (A,A,A) | line + fixed season |
| Multiplicative HW | level + trend + ×season | (M,A,M) | line × growing season |
The empirical verdict. In the M3 competition (3,003 series) and again in M4 (100,000 series), simple exponential-smoothing and ETS variants — especially damped trend — were brutally hard to beat; the M4 winner was a hybrid that combined exponential smoothing with a neural net (Smyl's ES-RNN). The lesson the field keeps relearning: for a single, short, noisy series, a one-parameter smoother often beats a deep model, and any serious forecaster keeps ETS as the baseline that earns its keep.
Choosing the smoothing parameters
You do not set \(\alpha, \beta, \gamma\) by hand. The standard procedure picks them — together with the initial states \(\ell_0, b_0, s_0\) — by minimising the in-sample sum of squared one-step errors (equivalently, maximising the Gaussian likelihood of EQ T3.8):
Two cautions experts will raise. First, do not minimise SSE on the data you will also report accuracy on; hold out the tail of the series, or use time-series cross-validation (rolling-origin evaluation, Time Series 01), or trust the AIC from the likelihood. Second, an optimiser will happily push \(\alpha \to 1\) on a series that is really a random walk — a correct answer that looks like overfitting but is not. The instrument below traces the SSE objective for SES so you can see its shape: usually convex with a clear minimum, occasionally flat (the data barely constrains \(\alpha\)).
Exponential smoothing models the mean of a series and treats the variance as a constant nuisance. For financial returns that assumption is exactly backwards: the mean is near-unforecastable but the variance clusters — calm begets calm, a shock begets shocks. Time Series 04 turns the smoothing machinery loose on the variance itself: ARCH, GARCH, and the volatility models that price risk.
References
- Holt, C. C. (2004, orig. 1957). Forecasting seasonals and trends by exponentially weighted moving averages.
- Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages.
- Hyndman, R. J., Koehler, A. B., Ord, J. K. & Snyder, R. D. (2008). Forecasting with Exponential Smoothing: The State Space Approach.
- Hyndman, R. J., Koehler, A. B., Snyder, R. D. & Grose, S. (2002). A state space framework for automatic forecasting using exponential smoothing methods.
- Gardner, E. S. & McKenzie, E. (1985). Forecasting trends in time series.
- Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2020). The M4 Competition: 100,000 time series and 61 forecasting methods.
- Hyndman, R. J. & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.), Ch. 8.