Episode 15: September 10, 2010
Right from its beginnings in the 1800s, baseball has been accompanied by a barrage of numbers. Why are statistics so important to baseball? Why did they develop so naturally, and why have they remained so fascinating for so long? What makes baseball different from football, basketball, and hockey?
Please note: this is not an exact transcription of the episode.
It seems like every other time I watch the Mets this year, Ike Davis hits a home run. I've only watched around 20 Mets games and I know I've seen him hit at least 12 home runs (including one in person against the Twins that was estimated at 440 feet). If the Mets have played 140 games, Ike Davis must have around 70 home runs already, and he's just a rookie. Why isn't everyone talking about it?
I'm Alex Reisner...
So, I looked it up and, well it turns out Ike Davis has actually only hit 18 home runs. So it's just a coincidence that I've seen him hit 12. That's why we need to keep track of stats in baseball. We need people who score games and compile the numbers, and spend hundreds of hours making sure they're accurate. Without them we'd have to rely on our own experience and memory, and I'd have to assume that Ike Davis is one of the greatest home run hitters ever.
[response to "the game is played on the field"]
A lot of people assume that baseball statistics are a new thing, but they're not. There have been a lot of new stats developed in the past 40 years to help us measure different aspects of the game, but daily box scores and lists of batting averages have been around since the 1870s. You might imagine that back then they were very crude, but you'd be wrong. Box scores from the 1800s were different, because the game was different, but they were at least as detailed as they are today. In fact, back then you could often find 10 columns or more of fielding stats alone! They didn't have the technology to keep accurate running totals for the season, but the single-game reports were extremely detailed. So baseball has been accompanied by a barrage of numbers pretty much from the beginning.
Now, there are statistics in football, basketball, and hockey, but you don't hear people talking about those numbers as much as you hear baseball fans talking about batting averages and home runs. I know Wilt Chamberlain scored 100 points in a game once, but there aren't a lot of numbers as meaningful as baseball's 714, 511, or 61. A lot of baseball fans can tell you without looking it up that Willie Mays hit 660 home runs. These days you see and hear more football and basketball stats on TV, but still not as many as you do on baseball games. And this becomes even more true as you go further back in time.
So why does baseball have all these stats? Well, the popular answer is that it's easy to collect stats in baseball because there are a lot of natural breaks in play. Something happens, then there's a break, then something else happens, then there's a break...so it comes at you in short bursts, and if you look around the ballpark during those breaks you'll see people writing down what just happened in their scorebooks. That's right: people, casual fans, actually keep track of statistics during a baseball game as a way to stay engaged while they're waiting for the next thing to happen. Fathers even teach their sons how to keep score. Watching the game and simultaneously turning it into numbers is actually part of the culture of baseball.
So I think the pace is a reason why it's easier to keep stats in baseball than in sports like basketball, hockey, and soccer, but football happens one play at a time. Overall it moves a little faster than baseball but why aren't football fans walking around with scorebooks? Well, first of all baseball is a pastoral game. I know that's kind of a cliche, but it's true that the pace of baseball is intentionally leisurely. There's a thing called the 7th inning stretch. It's played in the spring and summer when the weather is nice. In the original concept of a baseball field there's no outfield fence, so fans used to sit right on the field. It's like a picnic! Football is played in the winter when you don't really want to be fumbling around with a pencil and paper. It's just a different culture.
Secondly, and this is also part of the standard answer, in baseball it's easier to isolate each player's contribution. In football you have a lot of players surrounding the ball, blocks, picks, handoffs...there's never really a clear one-on-one matchup like batter vs pitcher, and even though you could keep track of yards gained and lost by quarterbacks, receivers, and rushers, it's not obvious what to record for guards, tackles, and linebackers, other than sacks. In other words, every game in every sport has a story, but it seems more obvious in baseball how to build a narrative through the simple notation of each play. What you'd end up with in football, I think, would be much less complete.
Anyway, I've never been completely satisfied by this argument. I think it's true that it's probably *easier* to record statistics for baseball than it is for other sports, but this still doesn't really explain to me what the original *motivation* was, or why it has continued to be so fascinating for over 100 years, while relatively little has happened with stats in other sports.
So anyway, here's my theory:
1. In baseball each batter has a very small number of opportunties in each game. In football a running back might take 10 or 15 handoffss, and in basketball a point guard might take 15 or 20 shots. But in baseball, batters usually come to the plate just 4 or 5 times. Of course there are a lot of games and so there are a lot of at-bats in a season, but that's exactly the point: what you see when you watch a single game is a tiny fraction of each player's season--it's nowhere near enough to really tell how good the player is. Think about it: in football, with just 16 games in a season, when you watch one game you could actually be seeing a significant portion of a player's career.
2. Also, in most other sports, even when a player isn't receiving a pass or taking a shot, you can watch them run around and react to what's happening on the field, and get a sense of their skill and athleticism. In baseball, half the players aren't even on the field, and most of the ones that are, are just standing around waiting! I have no idea what the actual numbers are, but I'd guess, with the exception of the pitchers, each player spends a total of maybe 10 minutes per game actually fielding balls and in the batter's box. The rest of the time is spent waiting in the field, adjusting batting gloves, getting signs from the coach, etc. This isn't a knock on baseball or baseball players, and I'm not saying baseball isn't physically demanding, it's just the nature of the sport that there isn't a lot of opportunity for fans to observe the players' skills. In statistical terms, each game gives you a very small sample size. At a basketball game you can easily see within 5 minutes how well every player on the floor moves. At a baseball game you might never even even see a player run. A first baseman could strike out 4 times and never have to go farther than first base to the dugout, and that same first baseman could also be the best player in the league...you just saw him on a bad day.
3. Finally, the things you *do* get to see baseball players doing mostly have a very small effect on the outcome of the game. Consider these situations:
* A centerfielder makes a diving catch...to make the third out with nobody on base.
* A batter hits a leadoff triple...but the next three batters strike out.
* A pitcher strikes out the side on 10 pitches...with his team down by 6 runs.
None of these events affect the score of the game, so what do they mean? In football, players get credit for gaining a certain number of yards even when they don't score a touchdown, but in baseball it's not obvious how much a single is "worth." Is it 25% of a run? 15%? Does it matter if anyone's on base? Or what inning it is? Or what the score is? I think these are all pretty natural questions. Hitting a baseball is so hard...there's such a low rate of success...it happens rarely enough that I think there's a natural desire to remember and give players credit for hits, even when they don't score runs. With no obvious way to award a fraction of a run, or an assist like in basketball, it's not surprising that newspapers in the 1800s simply started tracking every event that seemed like it might be important.
In other words, I think part of the reason for the popularity and continuing evolution of baseball statistics is the fundamental complexity of runs and the difficulty of valuing hitting and pitching and fielding in a game where there's so little scoring and where players don't normally score runs directly but rather add to their team's run-scoring *potential*. (So there are a lot of discrete events in a baseball game, most of them can be attributed to a single player, and it's not immediately clear what any of them will mean, but it's easy to tell the whole story later if you write them down.)
Even now, after 150 years of baseball, we still don't fully understand how to calculate a single player's overall run-producing ability. Through the years various people have calculated the relative values of singles, doubles, triples, and home runs... This became best known through Pete Palmer's "linear weights," but the approach doesn't take enough of a player's contributions into account and the results are not considered very useful anymore. Bill James invented the Runs Created stat but the formula changes every few years. More recently we have WAR and WARP, but those formulas are changing even more than Runs Created.
The fundamental problem of how individual players contribute to runs is so obvious in baseball that if you have the kind of brain that likes to solve problems, you can get obsessed pretty quickly.
The seemingly natural marriage between baseball and numbers doesn't mean that you can't collect meaningful stats in football and even basketball and hockey. It's hard to do with the naked eye, but technology is now at a point where, with the right equipment, we can gather all kinds of data on player position, speed, and location at any moment during a game and I have no doubt that eventually that information will lead to all kinds of breakthroughs even in the most fluid sports like hockey.
It also doesn't mean that the data we collect in baseball is the right data. For 130 years we recorded what we could see, because that's all we could do. Now every Major League park has cameras which create computer models of every pitch and record all kinds of data on velocity and movement. Within the next few years it's likely we'll also have cameras on the light towers recording fielder and baserunner positions. Personally I've been waiting a long time for that because I don't think we can have really meaningful fielding statistics until it happens. Or at least we don't really now how accurate and reliable our current fielding stats are until we have that data.
And that leads me to wonder if our hitting stats are as meaningful as we think they are. Hitting stats are, in a sense, the backbone of baseball statistics--they're the ones that are most familiar to the general public, and the basis for the majority of research over the past 50 years. But you might know the parable about the man who went for a walk around his neighborhood one night and lost his wallet. He then spent hours looking for it inside his house because that's where he could see...because it was dark outside. We've been looking for answers to baseball's big questions about clutch hitting and streakiness and how to value players in the data we have on singles, doubles, home runs, who's on base, how many outs, etc. What if the answers are in data on fielder positioning, pitching mechanics, batter reaction time, or some other data that hasn't been collected yet? Obviously as fans and researchers there's not much we can do--the technology has to evolve and MLB has to make it available, but I think it's important to recognize how far off base we could be, and how little we really understand. Baseball data analysis has come a long way in the past 50 years, but we're at the beginning of a whole new era. For the first time we have access to data that can't be observed and recorded on a scoresheet by the casual fan. If you think there's a lot of baseball statistics now, well, there's about to be a lot more.
All content on this web site and in podcasts copyright © 2010-18 Alex Reisner.