Figuring Probability Fluctuations in Baseball |

ALDEN MEAD ALL OF US ARE familiar with stories such as the following appearing in newspaper sports sections at the end of a season:
And a year later, neither team having made important changes:
If the margin in these two races had been one game instead of nine, the stories would, of course, have been quite different because everyone recognizes that one game is too small a difference to constitute a proof of real superiority. The question ought obviously to suggest itself: Over a 162-game schedule, how far apart can two teams finish The answers to these questions are important, both to fans and researchers trying to interpret baseball statistics and to club executives trying to decide whether a player has lost his skills (or has improved dramatically), whether a major overhaul of a team is in order, etc. It appears that no one has even posed these questions in connection with baseball, much less answered them. Certainly they are not part of the routine discourse of the national pastime. Questions such as these can be answered, at least in large part, by techniques of In this article, it will always be assumed that each at-bat, game, etc., is an Many baseball statistics can be reduced to a series of events, each one of which can be classified as either a success or a failure. For instance, an at-bat is a success (for the hitter) if he gets a hit, otherwise a failure; a game is a success if one's favorite team wins, otherwise a failure, etc. For things like this, the calculation of the SD turns out to be quite easy. The result is: For example, consider a .300 hitter who comes to bat 500 times in a season. On the average, he should have 150 hits (successes) and 350 failures. To get the SD in the number of hits, we multiply 150 by 350, getting 52,500, divide by 500, getting 105, and finally take the square root, getting 10.2. To get the SD in the batting average, the SD in hits must be divided by the number of at-bats, 500 in this case, giving .020, or 20 points in the batting average. About 68.5% of the time, therefore, our .300 hitter will actually hit between .280 (one SD below the expected result) and .320 (one SD above). The remainder of the time he'll bat either above .320 or below .280. In fact, about 20 points is typical for the SD in the season batting average of a regular player. Note that the result would be different for a career: In 10,000 at-bats, our hypothetical .300 hitter would expect 3,000 hits and 7,000 failures. Repeating the same calculation, we get 45.8 for the SD in hits, .005 for the SD in batting average. Just as one expects, the chance of large fluctuations in the batting average gets smaller the more at-bats are taken; contrary to what many people expect, though, the SD in the number of hits actually gets bigger as one includes more at-bats. Some other season standard deviations (SD's) are: Home runs by a slugger who averages 36: 6 ERA of a pitcher with about 250 IP: about 0.30. Games won by team over a 162-game schedule: 6 Difference in games won by two evenly-matched teams (taking into account the fact that they play each other part of the time): 9 games. These fluctuations doubtless are greater than most people expect. In particular, the events described at the beginning of the article can easily be accounted for by fluctuations and would not necessarily imply any change in the actual skills of players or teams. The hypothetical changes in team standings, batting averages, home runs and ERA all correspond to fluctuations of one SD in one direction in the first year and in the opposite direction the next, something that could very easily happen if both seasons had been played with the exact same teams using a table game. If we define a regular non-pitcher as one with 300 or more at-bats and a regular pitcher as one with 150 or more innings pitched, then there are about 200 regular non-pitchers and 100 regular pitchers in a typical year. This is a large enough number that even relatively rare large fluctuations can happen a few times. In a typical year, one would expect that: About 30 regulars will have batting averages 20 points or more above what they should be; of these, about four will bat 40 points or more higher than they should. About 15 regular pitchers will have ERAs 0.30 or more below what they should be; of these, about two will have ERAs 0.60 or more below. About four teams will win six (or more) games above the number they should win. About once in two years, one of the 26 teams will win 12 (or more) games above the total it deserves. Obviously, results such as these are important for the understanding of baseball statistics. If a player's batting average in a given year is 20 points above his previous lifetime, average, it does not necessarily indicate real improvement, but could equally well be a fluctuation (owners negotiating with players' agents, please note). The same holds for all deviations from what we expect which are not much more than one SD in either direction. In particular, analyses such as the hypothetical stories with which we began this article are nonsense. (i.e., without a fundamental change in the game, or the appearance of a truly extraordinary player), so that further improvement must come about by fluctuation. It is then a straightforward matter to calculate the chance that a table game card programmed to duplicate the best recent performance could equal or surpass the record. The key quantity in all these calculations is what I've called the standard deviation distance, or SDD. It is just the number of SD's by which the best recent performance falls short of the record. For example, consider the category of home runs. The record, as we all know, is 61 by Roger Maris. The best total in the 1974-84 period was 52, by George Foster in 1977. Using the formula, together with his AB total for the year, we calculate Foster's home-run SD for that year to be 6.9. He was nine homers short of the record, nine divided by 6.9 is 1.30, so his SDD was 1.30, i.e., he was 1.30 SD's short of the record. Using tables which are available in books on statistics, one can find the odds against something coming out 1.30 or more SD's above what it should be on the average. The answer is 10.3 to one, and these are the odds against a player hitting on the average like the Foster of `77 reaching 61 or more home runs. Putting it another way, it is the odds against a 1977 Foster table game card reaching 61 or more home runs in the same number of at-bats. In the accompanying table are listed the modem season record, best in the 1974-1984 period, SD, SDD and odds against for ten offensive categories (hits, batting average, doubles, triples, home runs, total bases, slugging average, runs scored, runs batted in and stolen bases). The hitting streak has been included for good measure because it's often mentioned as a virtually unbreakable record. It is not susceptible to the SD approach, but the odds against can be calculated. For the best in 1974-84 in the hitting streak category, Rod Carew of 1977, who had the most hits per game during the period, was chosen because a high hits-per-game total gives the best chance of a long hitting streak. In using the formula for SD, the stats for the player best in 1974-84 were used, with that player's actual number of successes and failures. Each time at bat was considered an attempt in all categories except runs scored, RBI and stolen bases. For these, each plate appearance was considered an attempt. For total bases and slugging average, the SD's for singles, doubles, triples and home runs had to be calculated and combined. For RBI, the number of plate appearances resulting in two or three RBIs had to be estimated, the SD's calculated, and these combined. For stolen bases, the possibility of one plate appearance leading to two or more stolen bases was ignored. Because the best performance in a category in a ten-year period is probably itself a fluctuation, my values for odds against are probably conservative, but the rankings should be about right. Looking at the table, some readers may find some surprises, while others may just find previous opinions confirmed. For me, there was a little of both. The records divide themselves pretty well into four classes: First, the records for runs, slugging average and triples, for which the odds against are several thousand to one, may to all intents and purposes be considered unbreakable under present conditions. They could only be endangered by some basic change in the way the game is played (such as a rule change greatly favoring the offense) or by the appearance of a player so extraordinary that he simply can't be judged by the same standards as even the greatest stars of the present (Babe Ruth in his day may have been such a player). The records for RBIs and hitting streak, with odds against of a few hundred to one, are hard to break, but there is a slight chance if a player has a really outstanding season and also enjoys a large fluctuation. Then there are five categories with the odds well under 100 to one, for which the chances of breaking might be rated as fair. It would not be too surprising if one of these records were to fall within the next decade. Finally there is the record for stolen bases, which was set during the 1974-84 period and is thus by definition vulnerable to an outstanding player of that period. The case of stolen bases emphasizes, however, how a change in the way the game is played can affect the vulnerability of records. If this same calculation had been done in 1940, about the time I first really began following baseball, a different situation would have been encountered. The record for stolen bases then was 96 held by Ty Cobb. The best total in the 1930s was 61 by Ben Chapman in 1931. His SD was 7.45, and his SDD was 4.70, making the stolen-base record at that time even tougher to break than the runs scored now! And indeed, as long as managers' attitude toward the stolen base remained as it was in the `30s, Cobb's record It is my hope that this article will stimulate interest in the use of |