The Probability of the League Leader Batting .400 |
By Dallas Adams In 1980 George Brett, while ultimately falling short, came close to hitting for a .400 average. The question naturally arises as to the probability of anyone hitting .400. The commonly held view nowadays is that night ball, transcontinental travel fatigue, the widespread use of top quality relief pitchers, big ballparks, large size fielders' gloves and other factors all act to a hitter's detriment and make a .400 average a near impossibility. But, surely, the above items will affect all batters, not only the potential .400 hitters; and, therefore, the net effect of all these factors will be reflected in the composite league batting average. If the league average is low, the chance of there being a .400 hitter is also low; a high league average means a higher chance of a .400 hitter. Consider the experimental data: Figure 1 shows, for each major league season from 1901 through 1980, the average for each league's batting champion plotted against the league batting average. Of particular interest is the dashed line which marks the rather well-defined upper boundary of the data points. This line represents the ultimate level of batting performance in 80 years of major league baseball. Note that this boundary crosses the .400 level of individual performance at a league average of .255, which can be considered the effective minimum league level from which a .400 hitter can, historically, emerge. For the present era, the minimum league level required is probably higher than .255. There is, for example, evidence that the gap in talent between the league's average and best players has been steadily lessening over time. Indeed, the points defining that upper boundary were all achieved in the deadball era. For any given league batting average, the experimental probability of an individual .400 hitter could, if there were sufficient data, be obtained directly off Figure 1 by counting. For example, at a league average of .265 there was one season with a .400 hitter and three seasons without. Unfortunately, the simple approach is inadequate because of sparseness of data: eleven .400 hitters spread over a range of .230 to .303 in league batting average. It is necessary, therefore, to group the data. For this study a moving average covering .009 points in league batting average was employed. This means that the experimental data for each specific league batting average was augmented by all the data within ±.004 points. Thus for a .265 league average, by way of example, the 29 data points in the range .261 through .269 are used, rather than only the four data points at exactly .265. Those ranges above .288 contained ten or fewer data points and were considered insufficiently populated to be included in the calculations. Despite the smoothing effect of the moving average technique, there remains some jumping about of the resultant experimentally determined probabilities but the general trend is apparent, as shown by the individual points on Figure 2. From a more theoretical point of view: consider, for example, a league batting average of .265; .400 is 51% higher than .2 65. Thus the question: what is the probability of a player compiling a personal batting average which is at least 51% higher than his league's .265 average? At this juncture it is necessary to introduce the "Relative Batting Average" concept of Shoebotham (1976 Baseball Research Journal, pages 37-42). In its simplest form, a relative batting average is a player's average divided by his league's average. If one calculates the relative batting average for all major league batting champions from 1901 through 1980, the results approximate a normal distribution (the familiar "bell-shaped curve") with a mean (average) value of 1.361 and a standard deviation (a measure of the dispersion of data about the mean) of 0.075. Now, the useful thing about a normal distribution of known mean and standard deviation is that the probability of occurrence for any arbitrary value, above or below the mean, can be calculated. For a league average of .265, we want to calculate the probability of a player making an average 51% higher, a relative batting average of 1 .5 1; the computations give a 2.4% probability for this. Similar computations have been made for a wide range of League batting averages, and the resulting theoretical probabilities are shown by the solid line on Figure 2. The theoretical and experimental results are in good agreement. Thus, we have two approaches for examining the historical odds which George Brett was challenging. The1980 American League batting average was .269. From Figure 2, the experimental data points at and near .269 indicate that there's about a 10% chance for a .400 hitter under such conditions. The theoretical probability is even less optimistic; it shows a 4.7% chance. The long odds against Brett in 1980 help illustrate why there has not been a .400 average in the major leagues since 1941. The odds lengthen appreciably as league batting averages shrink below 269. In the 39 years since 1941, the American League batting average has only twice bettered .269 and the National League has never done it. If the theoretical probabilities of Figure 2 are used, the calculations reveal that there is a 51% chance of there NOT being a .400 hitter in any of the past 39 years. |