Bracketology talk at the Crossroads of ideas

The Math Behind the
March Madness Tournament and
College Football Playoff
Laura Albert McLay
Associate Professor, ISYE
laura@engr.wisc.edu
@lauramclay
@badgerbrackets
http://bracketology.engr.wisc.edu/

Let’s start with the 2 minute
version of my talk
https://www.facebook.com/UWMadison/videos/10154004638653114/

First, of all…
I’m a industrial and systems
engineering professor by day And a bracketologist by night!

I study systems
A system is a set of things—people, cells, vehicles,
basketball teams, or whatever—interconnected in
such a way that they produce their own pattern of
behavior over time.
My discipline is operations research: the science of
making decisions using advanced analytical methods

Our world is becoming increasingly
complex and increasingly connected
Systems matter!
Mathematical models
and systems thinking
help us study systems
and navigate the
complex, interconnected
world we live in.

What do we hope to learn from
probability models like Markov
chains?
• How do we draw conclusions from limited data?
• How can we make data-driven decisions in the
presence of uncertainty?

How I got started in bracketology
In 2014 someone suggested I examine
bracketology in the context of the first
College Football Playoff…
…and so began Badger Bracketology
My objective: forecast which teams
would make the first college football
playoff before the season was over.

Markov chains:
The Little Engine that Could
Markov chains:
A type of math model for understanding how a
system can evolve over time.
Uses: finance, epidemiology, queues, zombies

Markov chains for ranking teams in a nutshell
Each team is a state. A team “votes” for teams that that it loses to
http://sumnous.github.io/blog/2014/07/24/gephi-on-mac/
Graph of 2014
college football season

Simple yet powerful idea
Automatically rate and ranks teams by
taking advantage of the network structure
of the match ups
• Use Markov chains to account for strength of schedule
• Do not need a human in the loop
Simple data requirements:
1. Game outcomes (score differentials),
2. Home/away status
Takes difficulty of future games into account in football playoff
forecasts
• Polls give the ranking right now, only gives insight a playoff held
today

Google PageRank is a Markov model!
Source: google.com

Do you remember Internet searches
before Google?
https://www.wordstream.com/articles/internet-search-engines-history

First, let’s talk about
ranking basketball teams

Transitions
Rutgers 52 @ Wisconsin 72
Wisconsin Rutgers 1 − 𝑊
𝑊
𝑊
1 − 𝑊
How much credit should Wisconsin get for beating Rutgers by
20 at home?
𝑊 = effective wins (fraction of a vote), which help us compute
our Markov chain transition probabilities

Let’s find a data-driven answer!
Given that team 𝑖 beat team 𝑗 by 𝑥 points at home, what is the
probability that 𝑖 is a better team than 𝑗 on a neutral court?
Data: Some teams play twice per season (home  away)
Given that team 𝑖 beat team 𝑗 by 𝑥 points at home, what is the
probability that 𝑖 is a better team than 𝑗 on 𝑗′
𝑠 home court?
𝑟𝑥
𝐻 𝑟𝑥
𝐴 = probability that a team outscores its opponent by 𝑥
points at home 𝐻 (away 𝐴) is better than its opponent on a
neutral 𝑁 site
Developed by Sokol, Kvam, Nemhauser, and Brown at Georgia Tech to rank NCAA men’s basketball teams
https://www2.isye.gatech.edu/~jsokol/lrmc/

What is the probability you win your next
game (on the road) given that you win by 20 at
home?

Logistic regression to the rescue!
Problem 1: must win by 50+ points to get a lot of credit for a win!
Winning/losing close games gives you the same amount of “credit”
Margin of victory 𝑥
Probabilityofwinningontheroadnexttime
Problem 2: We need to get neutral site win probabilities

Logistic regression for
NCAA men’s basketball
• Use log (Point differentials) instead!
• Do not truncate point differentials
-30 -20 -10 0 10 20 30
0
0.2
0.4
0.6
0.8
1
Point differential
Effectivewins

Winning matters
• Average in a pure win/loss model to give more credit for winning the
game
-30 -20 -10 0 10 20 30
0
0.2
0.4
0.6
0.8
1
Point differential
Effectivewins
-30 -20 -10 0 10 20 30
0
0.2
0.4
0.6
0.8
1
Point differential
Effectivewins

Putting it all together
• End up with the red line!
-30 -20 -10 0 10 20 30
0
0.2
0.4
0.6
0.8
1
Point differential
Effectivewins
-30 -20 -10 0 10 20 30
0
0.2
0.4
0.6
0.8
1
Point differential
Effectivewins

Markov chain transition probabilities
Rutgers 52 @ Wisconsin 72 *
Wisconsin Rutgers 1 − 𝑊
𝑊
𝑊
1 − 𝑊
How much credit should Wisconsin get for beating Rutgers by 20 at home?
P(UW beats Rutgers on a neutral court) = 0.6255
𝑊 = 0.6817 effective wins (fraction of a vote)
* Wisconsin 61 @ Rutgers 54 later on 1/28/2017

Transitions
Same idea for the rest of the games…
Wisconsin
Minnesota
Northwestern
Rutgers
Illinois

Current rankings
3/12/2017 Selection Sunday
1 Gonzaga
2 Villanova
3 Kentucky
4 SMU
5 Wichita St
6 Arizona
7 UCLA
8 Duke
9 Cincinnati
10 Oregon
11 MTSU
12 North Carolina
13 St Marys CA
14 West Virginia
15 Kansas
16 Nevada
17 Purdue
18 Vermont
19 UNC Wilmington
20 Michigan
21 Florida St
22 VA Commonwealth
23 Notre Dame
24 Bucknell
25 Wisconsin

The B1G, ranked.
3/12/2017
17 Purdue
20 Michigan
25 Wisconsin
41 Northwestern
43 Minnesota
54 Maryland
78 Indiana
87 Michigan St
121 Iowa
130 Illinois
141 Ohio St
176 Penn St
187 Rutgers
242 Nebraska

How did we do last year?
3/13/2016 Selection Sunday
1. North Carolina
2. Kansas
3. Villanova
4. Michigan St
5. Virginia
6. West Virginia
7. Oklahoma
8. Kentucky
9. Oregon
10. Purdue
11. Xavier
12. Miami FL
13. Duke
14. Utah
15. Texas A&M
16. Louisville
17. Maryland
18. Arizona
19. Seton Hall
20. Iowa St
21. Indiana
22. California
23. Baylor
24. St Josephs PA
25. Iowa

Now let’s talk about the

Objective: determine which teams would make the first
college football playoff.
Goal: to forecast the top 4 teams weeks before the season
ends.
Solution method: a ranking method.
Challenge: need to simulate the remainder of the season and
rank the teams at the end of the (simulated) season.

Giant assumption
• We assume the selection committee will pick the four
ranked teams in the playoff.
• History suggests that humans prefer the most deserving
teams rather than the best teams in the national
championship game.
• E.g., 2013 Alabama lost on
a fluke play.
• …but the College Football
Selection Committee might
have changed this!
2013 BCS Rankings just before bowl bids

College football playoff
committee rankings
2014 Playoff rankings 2015 Playoff rankings

How we did last year
2016 Playoff Rankings Badger Bracketology rankings
1 Alabama
2 Ohio State
3 Clemson
3 Washington
5 Michigan
6 Penn State
7 Western Michigan
8 Louisville
9 Oklahoma
10 Wisconsin :(

Model: two parts
0. Observe a few (7-8) weeks of game outcomes
1. Ranking.
• Assign a rating to each team to rank the teams.
• Similar to what we had before but with college football data
2. Game simulation.
• Determine who wins a game based on the team ratings.
Simulate the next week’s game outcomes.
• Combine these:
• Re-rate and re-rank after each week of games.
• Simulate the remainder of the season.
• Report teams most likely to be in the top 4

Score differentials
Yes, running up the score matters, mathematically.
Histogram of score differentials, 2012-2014
Home score - away score
Frequency
-60 -40 -20 0 20 40 60 80
050100150200

Capped score differentials
38% of conference games fall beyond the cap
Histogram of score differentials capped at +/-21, 2012-2014
Home score - away score
Frequency
-60 -40 -20 0 20 40 60 80
050100150200250
Note: Rating systems used by College Football Playoff committee must use wins/losses
only (not score differentials). Running up the score makes a difference!

-20 -10 0 10 20
0
0.2
0.4
0.6
0.8
1
Point differential
Effectivewins
Sx
H
rx
H
rx
N
Build the Markov chain for football
• Used 3 seasons of data (truncate scores by +/-21)
• Use games played in consecutive years to identify win
probabilities to feed into the Markov chain
-20 -15 -10 -5 0 5 10 15 20
0
0.2
0.4
0.6
0.8
1
logistic regression
logistic regression averaged with win (weight = 2/3)
logistic regression averaged with win (weight = 1/3)

Modified Log Logistic Regression
Markov Chain (ln(mLRMC))
• Same as mLRMC except that we consider log point differentials to
dampen big score differentials
• Do not truncate point differentials
-20 -10 0 10 20
0
0.2
0.4
0.6
0.8
1
Point differential
Effectivewins
logistic regression (home team)
logistic regression averaged with win

Markov chain transitions
Use mLRMC and ln(mLRMC) for all games

Simulate the rest of the season!
0. Observe a few (7-8) weeks of game outcomes
1. Ranking.
• Assign a rating to each team to rank the teams.
2. Game simulation.
• Determine who wins a game based on the team ratings.
Simulate the next week’s game outcomes.
• Combine these:
• Re-rate and re-rank after each week of games.
• Simulate the remainder of the season.
• Report teams most likely to be in the top 4

Win probability parameters
The win probability between teams 𝑖 and 𝑗, where 𝑖 is the home
team is captured by the best-fit logistic regression model using
two years of game data:
𝑝𝑖𝑗 =
𝑒 𝑏+𝑎(𝑟 𝑖−𝑟 𝑗)
1 + 𝑒 𝑏+𝑎(𝑟 𝑖−𝑟 𝑗)
where
𝑟𝑖 − 𝑟𝑗 = the difference in ratings between the two teams.
and assign a point differential to the winner.
Game prediction accuracy (averaged per game)
Statistic Model Training set Test set
Mean Absolute Error mLRMC 0.2043 0.3152
ln(mLRMC) 0.2026 0.3162
Mean Squared Error mLRMC 0.1006 0.1885
ln(mLRMC) 0.0999 0.1897

College football playoff
committee rankings
2016 Playoff rankings2015 Playoff rankings

2016 Results: Rankings (NOW)
Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14
Alabama CFP Committee 1 1 1 1 1 1
ln(mLRMC) 1 1 1 1 1 1 1 1
Clemson CFP Committee 2 2 4 4 3 2
ln(mLRMC) 3 4 3 3 5 4 4 3
OSU CFP Committee 6 5 2 2 2 3
ln(mLRMC) 4 5 5 4 3 3 2 2
Washington CFP Committee 5 4 6 5 4 4
ln(mLRMC) 8 7 6 6 6 6 5 3*
* Clemson and Washington were tied

2016 Results: Rankings
Forecasted ranking of likelihood to make playoff (any seed, out
of 1000)
Alabama
ln(mLRMC)
Forecasted
ranking 1 1 1 1 1 1 NA
ln(mLRMC)
now ranking 1 1 1 1 1 1 1 1
Clemson
ln(mLRMC)
Forecasted
ln(mLRMC)
now ranking 3 4 3 3 5 4 4 3
OSU
ln(mLRMC)
Forecasted
ln(mLRMC)
now ranking 4 5 5 4 3 3 2 2
Washington
ln(mLRMC)
Forecasted
ln(mLRMC)
now ranking 8 7 6 6 6 6 5 3*
* Clemson and Washington were tied

2015 Results: Rankings (NOW)
Clemson CFP Committee 1 1 1 1 1 1
mLRMC 2 3 1 1 2 1 3 2
ln(mLRMC) 7 5 1 1 2 1 1 2
Alabama CFP Committee 4 2 2 2 2 2
mLRMC 4 5 8 3 1 2 1 1
ln(mLRMC) 5 4 6 2 1 2 2 1
MSU CFP Committee 7 13 9 5 5 3
mLRMC 6 4 5 9 9 5 5 3
ln(mLRMC) 6 2 4 7 8 4 4 3
Oklahoma CFP Committee 15 12 7 3 3 4
mLRMC 16 13 13 8 5 3 1 4
ln(mLRMC) 18 12 16 10 5 5 3 4

Team Method Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13
Clemson mLRMC 667 897 931 905 915 949 956
ln(mLRMC) 749 840 897 893 955 923 976
Alabama mLRMC 361 209 166 837 913 943 995
ln(mLRMC) 427 240 197 858 847 931 996
MSU mLRMC 179 213 261 54 24 569 675
ln(mLRMC) 226 349 354 115 162 573 706
Oklahoma mLRMC 20 46 71 119 393 758 1000
ln(mLRMC) 12 73 16 63 142 247 1000
2015 Results:
Forecasted number of times to make playoff
(out of 1000)
Nebraska
beats
MSU
MSU
beats
The OSU
No Big12
championship
Slight difference in
rankings:
3rd /4th vs. 5th /6th

2015 Results:
Forecasted ranking of likelihood to make
playoff (any seed, out of 1000)
Clemson mLRMC 2 1 1 1 1 1 3 2
ln(mLRMC) 2 1 1 1 2 2 3 2
Alabama mLRMC 5 7 6 2 2 2 2 1
ln(mLRMC) 4 6 8 2 1 1 2 1
MSU mLRMC 7 6 7 11 10 4 4 3
ln(mLRMC) 6 5 5 9 6 3 4 3
Oklahoma mLRMC 18 13 13 9 4 3 1 4
ln(mLRMC) 21 15 14 12 8 6 1 4
No Big12
championship
No simulation:
the season is
over. We think
the committee
got it right!
Ranked 2nd & 7th
after week 7
Ranked 5th & 4th
after week 9

2015 Results:
What happened to The Ohio State University?
Rankings after week 12
Forecasted rankings after
week 12
1. Clemson 1. Clemson
2. Alabama 2. Alabama
3. Oklahoma 3. Oklahoma
4. Notre Dame 4. Michigan State
5. Michigan State 5. Iowa
6. Ohio State 6. Notre Dame
7. Iowa 7. Stanford
8. Florida 8. Florida
9. Michigan 9. Ohio State
10. Stanford
(no other teams have >1%
chance of making the playoff)

Final thoughts about March Madness

Picking the perfect bracket
There are about 9.2 quintillion ways to fill out a bracket…
And 1 way to fill out a perfect bracket
The odds of filling out a perfect bracket are not 9-
quintillion-to-1 because:
(a) the tournament isn’t like the lottery where every
outcome is equally likely, and
(b) monkeys are not randomly selecting game outcomes.
Instead, people are purposefully selecting outcomes.

Can math help our odds?
FiveThirtyEight notes that the typical bracket has a
2.5 trillion-to-1 odds of being perfect:
• https://fivethirtyeight.com/features/march-madness-
perfect-bracket-odds/
BracketOdds at Illinois estimates that a historical
average winning bracket performs at 4.4 billion-to-1
• Warren Buffet may have to pay out!

The thing with perfect brackets
They depend on the year.
Let’s only look at how many people correctly select all Final Four teams:
– 1140 of 13 million brackets correctly picked all Final Four teams in 2016
– 182,709 of 11.57 million brackets correctly picked all Final Four teams
in 2015 *
– 612 of 11 million brackets correctly picked all Final Four teams in 2014
– 47 of 8.15 million brackets correctly picked all Final Four teams in 2013
– 23,304 of 6.45 million brackets correctly picked all Final Four teams
in 2012
– 2 of 5.9 million brackets correctly picked all Final Four teams in 2011
* Only 1 bracket emerged from the round of 64 with all 32 correct picks

Tips for winning your office pool

1. Don’t use RPI
• Badger Bracketology (my favorite tool!)
• Logistic Regression Markov Chain (LRMC)
• FiveThirtyEight rankings of tournament teams
• Ken Pomoroy’s rankings
• Sagarin rankings
• Massey Ratings
• ESPN’s BPI rankings
Rankings clearinghouse: http://www.masseyratings.com/cb/compare.htm

2. Pay attention to the seeds
Some seeds generate more upsets than others
• 7-10 seeds and 5/12 seeds
Historically, 6/11 seeds go the longest before facing a
1 or 2 seed.

3. Don’t pick Kansas
• Be strategic. The point is NOT to maximize your
points, it’s to get more points than your opponents
• Differentiate your Final Four
• Check ESPN for the top picked teams. Some top teams
are overvalued and others are undervalued
• Last year:
• Kansas was selected as the overall winner in 27% of brackets
(and in 62% of Final Fours) with a 19% chance of winning (538)
• UNC selected as overall winner in 8% of brackets (with a 15%
win probability) and Villanova in 5.5%
http://games.espn.com/tournament-challenge-bracket/2016/en/whopickedwhom
https://projects.fivethirtyeight.com/2016-march-madness-predictions/

4. It’s totally random
A good process yields good outcomes on average
• It does not guarantee the best outcome in any given
tournament
Small pools are better if you have a good process
• Scoring can be random
• The more brackets, the higher chance that a “random”
bracket will be the best

Topics in Sports Analytics
ISYE 601 in Spring 2017!
• Goal: teach students data-driven methods for making
better decisions using sports as a vehicle
• Course topics:
• Linear regression
• Logistic regression
• Empirical Bayes
• Ranking methods
• Probability models and Markov chains
• Forecasting
• Game theory
• Tournament scheduling
• Networks (is my team mathematically eliminated from the
playoffs?)
…and more!

In the news!
56https://punkrockor.com/in-the-news/

Thank you!
Laura Albert McLay
laura@engr.wisc.edu
punkrockOR.com
bracketology.engr.wisc.edu
Twitter: @lauramclay, @badgerbrackets

Bracketology talk at the Crossroads of ideas

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Bracketology talk at the Crossroads of ideas

Ähnlich wie Bracketology talk at the Crossroads of ideas (20)

Mehr von Laura Albert

Mehr von Laura Albert (14)

Kürzlich hochgeladen

Kürzlich hochgeladen (18)

Bracketology talk at the Crossroads of ideas