Applied Bracketeering, 2018 Edition: Do streaks matter?

By: Richard W. Sharp

Normally in winter, the world uses waste heat from cryptocurrency mining to heat homes, but soon it will be time to refocus the world’s computing power on something much more productive: producing pool-busting brackets.

It’s time for March Madness, baby! Last year we had a little fun while following the tournament by taking a serious, principled approach based on randomness seeded by school mascots. This year, we’re going to take a new approach by looking for something that the smart money hasn’t taken into account: streaks.   



The Question

How to win the pool?

We’re not trying to objectively determine which team is best, we’re trying to win cold hard cash. That means we focus on what our opponents, the other bettors, will be doing, more than on the teams playing on the court: they will generally start from chalk, pick a couple upsets, and express some irrational overconfidence in their father’s brother’s nephew’s cousin’s former roommate’s alma mater


Last year’s approach: assume upsets are random

Our number one priority is to produce winning brackets. Playing straight seeds is a great way to look smart while losing. In order to place high enough to earn a year’s worth of lording it over your officemates you need to makes some picks that the smart money won’t: you need upsets.

Last year we took this at face value by assuming an upset is just that: something unpredictable, something random. By adding the right amount of randomness to our picks, we hoped to find some unusual winners that would translate to victory. We seeded the randomness by ranking mascots, a random characteristic of each school, from mighty forces of nature, through a middling zoo of animals, on down to a lowly set of colors. In the end, the approach showed promise, but we’re not here to talk about the past. In the end the approach was nice, not thrilling, but nice. Playing chalk or RPI did as expected (poorly), and we concluded that “if you picked one of the high performing randomized brackets, then you should be in striking distance within your pool.”


The new wrinkle

This year, it’s time to do more than just throw in some randomization. Is there something other than random noise that might predict an upset? One candidate is a team on a hot streak. An end-of-season streak may be an indicator that something fundamental has changed for a team, and at just the right time to march to March glory. Perhaps a young team has finally gained enough time on the court together to function as a cohesive unit or maybe a star player has returned from injury. In any case, many of the official ranking systems do not take this information into account. To the simple algorithmic approaches, such as perennial underperformer RPI, a win is a win whether it comes in November or on the closing day of a conference tournament.   

You’ve heard all these nice, heartwarming storylines. So have we. Let’s test them.

  • RPI: Based on a simple combination of winning percentage and the winning percentage of opponents and opponents’ opponents. Percentages don’t take the timing of a win into consideration.
  • LRMC: A Bayesian model out of Georgia Tech that primarily uses point difference in head-to-head competition. It is indifferent to the sequence in which teams play against each other.
  • Coaches poll and AP Top 25: These are subjective polls of individual experts. The final poll before the tournament begins may take streaks into account if the individuals involved deem it important. 

Of course, many of the teams in the tournament will have end-of-season winning streaks by definition: the conference tournament winners get berths in the tournament. We’ll have to work a bit harder than simply adding a binary “on streak/not on streak” variable to the model. We’ll try to determine the quality of the streak by considering factors such as the length of a streak (including whether it needs to be continuous) or the strength of the opponents.1


Conclusion

So here we go: pick up from where we left off last year by using a range of individual model rankings to establish a baseline, add a streak factor to try to detect a fundamental shift in a team that the polls (simple algorithmic ones at least) have missed, and finally add a dash of randomization to pick some upsets that are just flukes. We’ll create a wide range of brackets under a handful of different scenarios and see what performs best in the tournament. We’ll also compare these against some prominent brackets produced by friends and strangers alike.2 No strategy can win the pool every year, but hopefully the models can be improved to the point where they produce brackets that place well consistently.

Let’s get ready to rumble.


Notes:
1 If possible, we will also add player injuries to the mix, downgrading teams that have lost a starter shortly before the tournament. Good data for this is available, but we will focus first on accounting for a streak feature for the model and add injuries as time allows (or doesn’t).^
2 It seems unlikely that there will be a Commander-in-Chief bracket this year, unless somebody tells the current office holder that his predecessor did it better than he could.
^

About The Author

Richard is a Seattle area data scientist who builds predictive models and the services that deliver them. He earned a PhD in Applied and Computational Math from Princeton University, and left academia for the dark side of science (industry) in 2010, following his wife to the land of flannel. Fan of coffee, beer, backpacking and puns. Enjoys a day on the lake fishing, and, better, cooking up the catch for a crowd.

No Comments on "Applied Bracketeering, 2018 Edition: Do streaks matter?"

Leave a Comment

Your email address will not be published. Required fields are marked *