The Dixon-Coles model is the most cited academic framework for football match prediction. Published in 1997, it solves a specific flaw in standard Poisson regression and remains the benchmark against which newer models are measured.
The Problem with Basic Poisson
Standard Poisson regression models each team's goals independently. This works reasonably well overall, but it systematically underestimates the probability of low-scoring results — particularly 0-0, 1-0, 0-1, and 1-1 scorelines.
In reality, there is a slight positive dependence between low goal counts. When one team struggles to score, the match environment often suppresses the other team's attacking output too (defensive tactics, low tempo, weather). Basic Poisson misses this.
The Dixon-Coles Correction
Dixon and Coles introduced a dependence parameter rho that adjusts the joint probability of low-scoring results:
- For 0-0: multiply the Poisson probability by (1 - lambda x mu x rho)
- For 1-0: multiply by (1 + mu x rho)
- For 0-1: multiply by (1 + lambda x rho)
- For 1-1: multiply by (1 - rho)
Where lambda and mu are the expected goals for home and away teams. The parameter rho is typically a small negative number (around -0.13 to -0.08), which increases the probability of 0-0 and 1-1 and decreases 1-0 and 0-1.
Implementation Steps
1. Prepare Data
Load historical match data with columns: date, home team, away team, home goals, away goals.
2. Define the Model Parameters
Each team has an attack parameter (alpha) and defence parameter (beta). Plus:
- A home advantage parameter (gamma)
- The dependence correction (rho)
For a 20-team league, this means 41 parameters to estimate (20 attack + 20 defence + 1 home advantage), with rho estimated separately.
3. Apply Time-Decay Weighting
Multiply each match's contribution to the likelihood function by a decay factor:
Weight = exp(-xi x t)
Where t is the number of days since the match and xi controls the decay rate. A typical xi of 0.005 means a match from 6 months ago carries roughly 40% of the weight of a match played yesterday.
4. Optimise Using Maximum Likelihood
Use numerical optimisation (Python's scipy.optimize.minimize or R's optim) to find the parameters that maximise the likelihood of observing the historical results.
5. Generate Predictions
For a new match, combine the attack and defence parameters to get expected goals, then use the Poisson distribution with the Dixon-Coles correction to generate probabilities for every scoreline.
Example: Arsenal (attack 1.35, defence 0.82) vs Brighton (attack 1.05, defence 1.12). Home xG = 1.35 x 1.12 x 1.36 (home advantage) = 2.06. Away xG = 1.05 x 0.82 = 0.86. After Dixon-Coles correction: Home win 58.2%, Draw 21.5%, Away win 20.3%.
A £30 bet on the draw at odds of 3.60 returns £108. If your model gives 21.5% draw probability and the bookmaker's implied probability is 27.8% (1/3.60), the bookmaker is actually overpricing the draw — no value there. But if another market shows edge, your model guides you to it.
Beyond Dixon-Coles
The model can be extended with additional features: incorporating expected goals (xG) data, adding team-specific home advantages, or using bivariate Poisson distributions instead of the correction factor. Each extension adds complexity but potentially improves accuracy.