Episode 8: July 23, 2010
A recent ESPN article claimed that Barry Bonds was a more dominant home run hitter than Babe Ruth, and rewrites the all-time home run leader list with the use of some flawed math. I explain how to use z-scores to determine which players are actually the most dominant, and how to compare dominance across eras.
Please note: this is not an exact transcription of the episode.
Dominance. That's what every player wants, right? To dominate the sport. That's what Pujols has done for the past decade. Ted Williams dominated in the 40s. Ruth dominated in the 20s. Barry Bonds broke the single-season home run record in 2001 and the career record in 2007. But is Barry Bonds the most dominant home run hitter of all time? I submit: no freaking way.
I'm Alex Reisner...
Now, I'm *not* talking about steroids here, so whatever you think of the man named Barry Bonds, forget about it for the next 10 minutes because I'm just talking about numbers.
Peter Keating wrote an article in a recent issue of the ESPN Magazine in which he tries to "fix" the career HR leader list. He takes the list and "adjusts" players' totals according to their "dominance" in their era, so it's not just based on raw totals, but how hard it was for players to get those totals.
This is a great idea, but Keating completely screwed it up. I don't even want to read you his list because it's so wrong. Well, OK, I'll read you the list, but seriously, you should probably try to forget it. Now remember, this is the all-time home run leader list, with totals adjusted based on dominance:
1. Hank Aaron 724
2. Babe Ruth 663
3. Barry Bonds 660
4. Mel Ott 650
5. Willie Mays 628
6. Reggie Jackson 595
7. Frank Robinson 578
8. Ted Williams 569
9. Mike Schmidt 557
10. Harmon Killebrew 552
So there are some general things I agree with in this list:
* Mel Ott and Ted Williams HR totals go up.
* Bonds loses about 100 HRs.
Beyond that, it's pretty weird. Babe Ruth should not lose home runs because of a lack of dominance, and I don't think Reggie Jackson should gain any. The problem with this list is that he uses some really naive math, which also leads him to conclude that "in 2001, Bonds was 8.02 HRs per 100 PAs better than the league, a measure of dominance that exceeds any posted by Ruth."
Now that's just wrong. It's just, absolutely wrong.
Before I get into the math and *why* he's so wrong, let me just warn you that I'm going to throw some numbers at you and...it's not anything that confusing, just want to give you a heads-up. Also, if you want to go deeper than what I talk about here, you can see my web site (alexreisner.com/baseball) for a lot more detailed information.
So let's think about dominance for a minute... It helps to get away from baseball so let's imagine a hot dog eating contest where the players have 10 minutes to eat as many hot dogs as they can. There are 100 players in the contest and let's say the winner eats 40 hot dogs. The other 99 guys are tied for second place with 20. I'd say the winner dominated pretty thoroughly. I think you probably agree, but what do I mean when I say "he dominated?" I think I mean things like this:
1. he had way more than the average (20)
2. everyone else was very close to the average
3. nobody was very close to his score
So I think when we're talking about dominance, what everyone else did is just as important as what the player in question did. For whatever reason it was very difficult to eat more than 20 hot dogs. I can imagine a lot of reasons it's hard to eat 20 hot dogs, but these guys are professionals...maybe they got bigger as the contest went on...we don't know exactly why it was so hard, but the numbers tell us that 99 out of 100 people ate exactly 20, so clearly, it was *very* hard to do any better. This is what makes the winner's score so dominant: it's not just that he was way ahead of everyone else, it's that everyone else performed so similarly. There weren't any other standouts.
Now let's imagine a second contest. In this one the winner eats 50. The guy in second place eats 45, 3rd place eats 40, 4th place eats 35, and on down to 20. 80 guys are tied with 20, and the remaining 14 guys ate 15, 10, or 5 each. So this is pretty similar to the first contest in that most people ate 20 hot dogs. The average score in both contests is 20. But in the second 6 guys ate *more* than 20, and second place was only 5 behind the winner. So even though the winner of this contest ate more than the winner of the first, I'd say he was less dominant because the runners-up were closer.
I think this is our intuition: that you dominate if you win and nobody is even close. Now, the system Peter Keating used in his article doesn't recognize dominance like this. According to that system, the winner of our second contest is more dominant than the winnner of the first simply because he was more above the average than the first winner. But that's not enough to establish dominance.
What Keating should have done was look at z-scores. Z-scores are great. The z-score is a statistical method for measuring dominance. It takes everyone's scores into account and gives you a single number for each player. I've calculated z-scores for the hot dog contests: the z-score for the first winner is 10, and the z-score for the second winner is just under 5, which means the first winner was about twice as dominant as the second, which is exactly the kind of result we're looking for. It's a statistical device that mirrors our intuition about what it means to dominate.
So, getting back to baseball, let's look at Bonds and Ruth using z-scores. As Keating says:
in 2001 Bonds had a little more than 8 HRs per 100 PAs above the league average.
In 1927 Ruth had a little less than 8 HRs per 100 PAs above the league average, so by the flawed method, Bonds does look a little better. But let's look at the league averages in those years:
In 2001 the league average was 2.9 HRs per 100 PAs
in 1927 it was under 1.
So the league average for home runs was much lower in 1927, but the really important thing is that if you look at the number of players who were very close to the average, you'll see that in 1927, nearly everybody was very close to the average, whereas in 2001 there was a much wider range of HR totals.
So if we look at z-scores for those seasons:
Bonds in 2001 is 4.4 and
Ruth in 1927 is 6.4,
which means Ruth was almost 50% more dominant than Bonds. To get specific: in 2001,
Bonds hit 73 HRs,
Sosa hit 64,
Luis Gonzalez hit 57,
Shawn Green and Todd Helton each hit 49,
Richie Sexson hit 45, and so on.
Back in 1927:
Ruth hit 60,
Gehrig hit 47,
and in third place was Tony Lazzeri with 18.
There were only 8 players in the whole league who even hit 10 HRs! So yeah, Ruth hitting 60 when most guys couldn't hit 10 was was way more dominant than Bonds hitting 73 when a lot of guys hit 30. Anyone who tries to tell you that Bonds was more dominant than Ruth is missing something. Trust me on this. 1927 wasn't even Ruth's most dominant season. In 1920 he hit 54 HRs and nobody else hit even 20. The next year Ruth hit 59 and nobody else hit more than 24. Actually, 7 of the 10 most dominant home run hitting seasons of all time are Babe Ruth's. Bonds' 2001 season is nowhere near any of them. In terms of z-score Bonds' 2001 is probably most similar to Mike Schmidt's 1980 when he hit 48 HRs and Bob Horner was second-best with 35.
I've been looking at z-scores for years. I created a web site that lists z-scores for every standard offensive stat for every player season going back to the 1800s and I can tell you, without any hesitation, that when it comes to hitting home runs, *nobody* has ever been more dominant than Babe Ruth. No matter how you measure it, whether it's home run rate or raw totals, not only are his numbers dominant, his dominance is dominant. I'd talk about the runners-up but it's like winning a marathon by two hours. Who cares who finished second? Almost every year that Ruth played from 1918 to 1931 was among the most dominant home-run hitting seasons ever. No one else in history has more than 5 really dominant seasons, and Ruth has 13. And those guys with 5 really dominant seasons are not who you probably think. It's Cactus Cravath who was the big home run hitter just before Ruth, and Mel Ott in the 30s... Those guys are the runners-up but Ruth's dominance was much greater and much longer-lasting.
Bonds' 73 home runs in 2001 came when the league average was at its highest ever. We don't yet know all the reasons <>, but it was waaaaay easier to hit home runs in 2001 than it was in 1927. We know because more players did it.
Anyway, you're probably wondering: using z-scores, who are the most dominant players around today? Well, there really aren't any very dominant home run hitters. The guys who are worth talking about are the young guys hitting triples and stealing bases: Carl Crawford, Michael Bourn, Jacoby Ellsbury...
But the best of these, over the past 5 years, is Jose Reyes. When Reyes is healthy he is one of the best base stealers and triples hitters ever. His raw totals might not be as high as some guys from 50 or 100 years ago, but it was a different game back then. With the z-score we can level the playing field and see how great Reyes is. He lead the National League in 2005, 06, and 08, hitting 17, 17, and 19 triples respectively. Again, those aren't the highest numbers you've heard, but the league average has gone steadily down since 1920, and with today's emphasis on power hitting and walks, hitting triples is becoming sort of a lost art.
As for pitchers, a few of the big boppers just retired. Pedro Martinez, Randy Johnson, and Curt Schilling were three of the most dominant pitchers in several stats. If you look at enough z-scores I think you can actually make a case for Pedro Martinez being the second or third most dominant pitcher ever. Of course his peak wasn't that long, but during those few years he was scarier than just about anyone. I'm talking about Bob Gibson 1968 territory, or Walter Johnson 1912.
As for active pitchers, the most impressive stat is Roy Halladay's complete games. In an era when pitchers are not really expected to go more than 7 innings, Roy Halladay has pitched 9 complete games each of the past two years, and he already has 7 this year. Halladay is another master of a lost art form. There's really no one like him. Even Ubaldo Jimenez, in all his 15 wins, has only finished 3 games.
Anyway, you can look at all the z-scores you want if you go to my baseball stats web site at alexreisner.com/baseball/stats. It only has offensive stats but I'll be coming out with a new site soon that will have ALL the stats.
Speaking of dominance, if you can, you *have* to watch the Texas Rangers. Josh Hamilton, when he's healthy, might be the most exciting player I've ever seen. The guy's got ridiculous power to all fields, he's fast, he can field, he just hit his 31st double, he leads the league in extra base hits, batting average, total bases, runs created, and he's in the top ten in just about everything else. Vlad Guerrero usually hits before him so you've got two of the most exciting players in the game hitting back-to-back. You've got Michael Young and Nelson Cruz with 12 home runs each. Ian Kinsler and Elvis Andrus in the middle infield... This is a great team. If their pitching holds up and the team's bankruptcy situation doesn't become a distraction, they may cause some real trouble in October.
So anyway, that's my advice: check out my z-score web site at alexreisner.com/baseball/stats, and watch some Rangers games!
I'm Alex Reisner...
All content on this web site and in podcasts copyright © 2010-17 Alex Reisner.