World Cup Predictions: in a bonkers first round of games, even the best models get just over 50%

  1. March Madness Bracketeering
  2. Applied Bracketeering: Modeling March Madness
  3. Bracketeering update: Mascot randomness is beating the pants off RPI after round 2
  4. Applied Bracketeering: So, who saw that final four coming?
  5. Applied bracketeering wrapup: Highly-rated team wins in shocking finale
  6. Applied Bracketeering: Does our model also work for the NCAA Women’s tournament?
  7. Applied Bracketeering, 2018 Edition: Do streaks matter?
  8. Applied Bracketeering, 2018: Streaky Clean
  9. Bracketeering Sweet 16 update: The Infallible Braculator agrees to never speak of this past weekend again
  10. Bracketeering Final Four update: Round of the Usual Suspects (and Loyola)
  11. Bracketeering Finale: Much ado about nothing or A tale of four regions
  12. What countries punch above their demographic weight at the World Cup (and can this be predictive)?
  13. World Cup Predictions: in a bonkers first round of games, even the best models get just over 50%
  14. World Cup Predictions: Most models underestimate the chance of a tie.
  15. World Cup Predictions: Knockout round madness
  16. World Cup Predictions: The final countdown
  17. World Cup predictions wrap-up: Vive le France!
  18. The Insufferable Braculator™ Strikes Again. Can your NCAA Women’s Tourney predictions beat it?
  19. The Insufferable Braculator models NCAA Women’s basketball, chapter 2: Concerning chalk

By: Patrick W. Zimmerman

Soccer is a maddening sport to predict. Modeling soccer is easy because of the 88 billion games per year and hard because of the poor resolution in each game. Goals per game is low, so significant differences in team quality still sometimes don’t show up in any one game’s scoreboard. This is in extreme contrast to, say, basketball, where the 75-125 scoring events per game tend to minimize luck as a determining element.

That said, it’s amazing and irresistable. The spread and evolution of the world’s game has created a diversity of tactics, players, and competitions at the highest level that no other sport can match. Unlike basketball, hockey, baseball, or (American) football, it has resisted the consolidation of all the best players in one single league (soccer fans usually speak of a Big Five leagues – England, Spain, Germany, Italy, and France.). The effect of that breeds that most precious of sports situations: unfamiliarity.

In such an environment, there are fundamentally different tactical formations and player strategies clashing, without teams having well-drilled and rote counters for them. Even some of the more popular formations (4-4-2, 4-3-3, 4-2-3-1) can be played very differently. Spain’s midfield-focused 4-3-3 during their run of greatness at times seemed to have no forwards at all, filled with patient, methodical, tiny ball-hogs passing the ball into the net. The Chilean 3-3-1-3 / 3-4-1-2 under Bielsa & Sampaoli was a blitz of motion and energy and pressing. Part of the success of many great teams is the novelty of their approach (at least initially).

So, tactical variance and the relatively high influence of luck, both make the World Cup hard to forecast and incredibly compelling entertainment.

Game on!


The question

How is our model, incorporating historical World Cup results, relative player pool size, and recent performance, doing against other prediction systems?


The short-short version

It’s doing ok relative to other models! It’s doing eeeeehhhhhh relative to reality.


The models

We’ll compare our model’s performance to:

  • The experts at The FiveThirtyEight.
  • The collective wisdom of the betting public using OddsPortal’s meta-odds combining 14 casino books and online oddsmakers (we’ll use the final line before kickoff).
  • And, for humor and a baseline, FIFA’s rankings (we’ll consider any game between teams with ≤150pts separating them to be predicting a draw).

We’ll measure along two scales. First, a points system that assigns 1 point for every correct result, scaling up with each round (1 for group stage games, 2 for round of 16, 4 for quarterfinals, 8 for semifinals and 3rd place, 16 for final). To mirror soccer’s 3-points-for-a-win system, if the prediction is off by only a little bit (i.e, the model called a win for team A but the game ended in a tie), 1/3 of the point total will be awarded.

The second measure will be a simple system, only taking into account correctly called results. Did the model get it right, taking nothing else into account?


Model comparisons after one group game for every team

Model scoreboard
Model Points Points % Correct results Correct %
Principally Uncertain 10 0.625 9 0.563
The FiveThirtyEight 10 0.625 9 0.563
Betting Markets 10 0.625 9 0.563
FIFA rankings 8 ⅓ 0.521 6 0.375

So, good news, bad news for us. Good news: our model is holding up well in comparison with others and kicks the pants off of FIFA’s risible rankings. Bad news: getting just over half of the results is pretty underwhelming.

Note for future projects: soccer predictions are hard.


Game predictions and results
Stage Game P? 538 Odds FIFA Actual result
Group A Russia v. Saudi Arabia RUS RUS RUS TIE RUS, 5-0
Group A Egypt v. Uruguay URU URU URU URU URU, 1-0
Group B Portugal v. Spain TIE ESP ESP TIE TIE, 3-3
Group B Morocco v. Iran MAR MAR MAR TIE IRN, 1-0
Group C France v. Australia FRA FRA FRA FRA FRA, 2-1
Group C Peru v. Denmark DEN DEN DEN TIE DEN, 1-0
Group D Argentina v. Iceland ARG ARG ARG ARG TIE, 1-1
Group D Croatia v. Nigeria CRO CRO CRO CRO CRO, 1-0
Group E Brazil v. Switzerland BRA BRA BRA BRA TIE, 1-1
Group E Costa Rica v. Serbia TIE SRB SRB TIE SRB, 1-0
Group F Germany v. Mexico GER GER GER GER MEX, 1-0
Group F Sweden v. South Korea SWE SWE SWE SWE SWE, 1-0
Group G Belgium v. Panama BEL BEL BEL BEL BEL, 3-0
Group G Tunisia v. England ENG ENG ENG TIE ENG, 2-1
Group H Colombia v. Japan COL COL COL COL JPN, 2-1
Group H Poland v. Senegal POL POL POL POL SEN, 2-1

What next?

Ok, so modeling soccer on a game-by-game basis is hard. That’s also what makes it fun!

In addition to updates on model performance after every set of games, we’ll also look at how each model is performing at different resolutions after we have enough games played. That is to say, it’s assumed that models will be imperfect. How close is each one getting overall to nailing each team’s performance as a whole?

Huh, looks like math has a way to measure that. Meet our good friend, variance2).

At the end of the group stage, and then again at the end of the tournament, we’ll look at the variance in each model’s predicted points earned by each team. Rather intuitively, lowest σ2 = best model. Looking at variance across the whole field will also account for some crazy runs (South Korea 2002) as well as disappointing performances (Spain 2014). This will let us see both how accurate they were in an absolute sense as well as with respect to each other.

This cup is bonkers. We’re going to need more beer.

About The Author

Architeuthis Rex, a man of (little) wealth and (questionable) taste. Historian and anthropologist interested in identity, regionalism / nationalism, mass culture, and the social and political contexts in which they exist. Earned Ph.D. in social and cultural History with a concentration in anthropology from Carnegie Mellon University and then (mostly) fled academia to write things that more than 10 other people will actually read. Driven to pursue a doctorate to try and answer the question, "Why do they all hate each other?" — still working on it. Plays beer-league hockey, softball, and soccer. Professional toddler wrangler. Likes dogs, good booze, food, and horribly awesome kung-fu movies.

No Comments on "World Cup Predictions: in a bonkers first round of games, even the best models get just over 50%"

Leave a Comment