Friday, July 31, 2009

Tinkering with Pythagorean Expectation, Now With 80% less math

One of the original great Bill James ideas was a formula using runs scored for and runs allowed by a team to determine what their win-loss record should have been, rather than what it actually was. This notably identified factors of luck to determine whether a team left untouched likely were better or worse than they had appeared to be, as explained HERE.

(2003-2004 Royals fans may be a bit touchy about just how big of an effect this can be).

One of the biggest issues with traditional Pythag expectation was that the importance of statistical outliers could not be minimized in the greater calculation, and could be misleading regarding the team's greater success and how much random chance had to do with it (we've all seen it before: consider a pitcher with a high ERA in which it really was the result of one disastrous start rather than an actual consistent level of mediocre or poor play. That ERA is not indicative of what sort of performance you are likely to get - for reference, see the 2009 line of Ricky Nolasco)

Or, in our own context, a fictional week of games between two teams, the Warriors and the Aztecs:

Game 1: Warriors 18, Aztecs 0
Game 2: Warriors 2, Aztecs 4
Game 3: Warriors 0, Aztecs 1
Game 4: Warriors 4, Aztecs 5
Game 5: Warriors 13, Aztecs 8

Over these 5 games, the Warriors scored 37 total runs (7.4 per game) and allowed 18 (3.6 per).

Without doing a single bit of math, given those numbers, you would expect the Warriors to be a winning team (scoring 205% more than you allow is a good way to do that). The traditional formula says the Warriors should be 4-1 over this period, and yet they are 2-3, thanks to a couple of games of extremely high offense tilting the numbers.

In just a simple 5 game series, the formula is at -2. If similar games repeated themselves for a 162 game series, the formula would be 65 games off the actual pace for the team!

Mind you, this doesn't happen. Teams don't score 18 runs and then 0 in the same week very often. The teams will all regress to more "normal" performances, and the odd variation found in that 5 game set with the Aztecs would be lost forever, softened by reality. But that doesn't mean the flaw isn't there: it just got buried by reason. But these things happen: remember when the Rangers put up a 30 spot on Baltimore in 2007? That sort of fluctuation did really significant damage to Texas' runs per game (and Baltimore's allowed), even though the value of runs roughly 7-30 were minimal. Spread those runs out, and Texas can win probably 8 more games on the excess alone. That's roughly what Hanley Ramirez would give you over Yuniesky Betancourt. And it gets counted twice: those runs could have cost Baltimore to conservatively lose 8 more games ... that's a 16 game swing for the whole league: on the results of just one game, mind you, a spectacular one. But while a 30 run game is truly remarkable, 10 run games are relatively commonplace, and 3 or 4 of those can add up to the same error.

Of course, there was improvement. There are some excellent and math heavy ways in which Bill James' original Pythagorean expectation have been tweaked, notably HERE, the best available material on the subject from two of sabermetrics' most valuable minds.

But almost all of these calculations require a year in review, specifically as it relies to park value and run scoring environment - most of these variables are anything but static and make an end of season predictor very difficult. Of course, as is known and proven time and time again, nothing in baseball is on a perfectly laid path - players get hurt, players get traded, schedules are unbalanced.

But in a big enough sample size, (say, post-All Star break), it's easy to crank out a simple tool with pretty alarming accuracy (and embarrassingly low mathematical acumen) to determine roughly how a team will finish.

The first thing you need to know is the runs scored per game for the entire league: (in the case of 2009, 4.57 as of this print). Interestingly, baseball hasn't varied too much on this, considering: look at each year since 2000:

2009 (to date): 4.57
2008: 4.65
2007: 4.80
2006: 4.85
2005: 4.59
2004: 4.81
2003: 4.73
2002: 4.61
2001: 4.77
2000: 5.14 (!)

Only one of those numbers really stands out, and it's during the ... ahem ... offensive explosion of the late 90s and early 00s which I will leave for discussion elsewhere.

So barring some obsence 1968 pitching or 2000 hitting numbers, the average number of runs scored will be in somewhere in the 4s. And since I've never seen a team score exactly 4.54 runs in a game, you either score above the norm (5 or more) or fewer (4 or less). Simple, right? Well, not quite. The scale is top-heavy as it is open ended up on the right side (you can score unlimited runs, but you can't score less than zero). This doesn't have a huge effect, but it makes scoring or allowing 4 runs the grey area, as you will win half the time if you score or allow exactly 4 runs.

Now to predict a team's win-loss record going forward: we do it by a way that fans have done for a long time, only with definable values:

1. If a team scores 5 runs or more, it's an offensive success. That side of their game has given their team an adequate or better chance to win.

2. If a team allows 4 runs or fewer, it's a defensive success.

3. Count the number of total "successes" your team has for the year (you can have 2 successes in one game, or you can have 1 or zero, we are measuring different sides of the game). Keep the offensive and defensive successes separate: we'll use them again later.

4. Divide the total number of successes by 2. This is your expected number of wins to this point in the season. The variation should be small, and can be attributed to bad luck or could be a red flag for bad managing, especially if that approaches 5 or so per 100 games.

5. Divide the number of successes on both offense and defense by the number of games played to that point. This determines the rate at which your team is getting adequate performances.

6. Extrapolate that rate of success over your remaining games for both to determine your expected W-L for the end of the year.

All right, everybody still with me? Good. Because it's fun, and if they haven't done a good enough job themselves, let's rip apart the Kansas City Royals.

2009 W-L: 40-61

Offensive Successes: 39 (38.6%)
Defensive Successes: 51 (50.4%)

Expected W-L: 45-56

Uh-oh Trey.

So what we see here is nothing too terribly surprising ... The Royals are a mediocre pitching team (in a future post, I may see what happens if you replacement Grienke with an average starter) and a bad hitting team, and these numbers reflect that.

Expected W-L over 162: 67-95

That's sure not good, is it? Luckily more waiver wire gritty speedsters are on the way.


  1. Beautiful stuff, Walter.

    Quick question--over how many games is it appropriate to wait before the extrapolation step?

  2. I've had pretty good luck with post-All Star Break in the past. 5 is the biggest variation after 100 games I've seen yet. Last year I was off on the Royals total by -3.