Stats Often Can Deceive |
SIMPSON'S PARADOX By Karl HiesterA PARADOX, ACCORDING to Webster, is "a statement that seems contradictory, unbelievable or absurd but that may actually be true in fact." This is an apt description for several statements that could be made about the notorious split season of 1981. It is especially fitting for the well-publicized and lamentable fact that neither of the two National League teams with the best season won-lost percentages made the post-season playoffs that year. It will be recalled that neither the Cardinals (59-43 .578) nor the Reds (66-42 .611) were able to lead their divisions in either of the two halves of that season. The Cards were topped by the Phillies in the first half and by the Expos in the second half, while the Reds lost out to the Dodgers and Astros, respectively. What we're seeing here is a variant of a little-known but remarkable statistical wrinkle called Simpson's paradox, named thus for the mathematician who gave it a careful examination. The pure and unvarnished version of this paradox is stranger yet. Consider the pitcher who states to a fellow moundsman: "Sure you had a better won-lost percentage than I did last year, and okay, you did again this year, but my won-lost record was better over the two-year span!" Now, on the face of it, that claim appears to be contradictory and maybe even absurd, yet that is precisely the statement Tim Lollar of the San Diego Padres might have made to Oakland's Steve McCatty at the conclusion of the 1983 season. Witness the numbers: In 1982 McCatty was 6-3 for a percentage of .667, and Lollar was 16-9 for .640. In 1983 McCatty's record was 6-9 .400 and Lollar's was 7-12 .368. McCatty had the better record each year, but note the overall records for 1982 and 1983 combined: McCatty 12-12 .500, Lollar 23-21 .523. So Lollar's seemingly paradoxical statement is nevertheless a true one! Before attempting to shed some light on this little mystery, let us look at some batting averages (which are, in the precise sense, percentages) and see that Simpson's paradox shows up here, too. Thanks to the two-part breakdowns of season batting averages provided in Bill James' invaluable Baseball Abstract, we have the following comparison of the performances of Andre Dawson and Lee Lacy against right-handed and lefthanded pitching in 1983:
We see that, while Dawson outhit Lacy against each kind of pitching, it was Lacy who ended up with the higher season average. American Leaguers are by no means immune to Mr. Simpson's curious twist of arithmetic either. This time we will consider the 1983 batting records of Toronto's Damaso Garcia and the Yankees' Ken Griffey on grass and artificial turf, respectively:
Notice that the relatively large differences between the separate averages - Garcia was outhit by 56 points on grass and by 31 points on artificial turf- make Damaso's higher average for the season seem even more unlikely. Without being too mathematically burdensome, we will now try to see what lies behind these peculiar goings-on. The key thing to observe is that a hitter's batting average for an entire season is a weighted average of his batting averages for two (or possibly more) parts of the season. The weights are proportional to the numbers of at bats for the part seasons. Referring to the last example, we see that Griffey's lower average on grass receives almost seven times the weight of his higher artificial turf average. On the other hand, Garcia's higher average on turf gets nearly twice the weight that his lower average on grass receives. This proportional weighting is just sufficient to push Garcia's overall season average slightly ahead of Griffey's. Finally, this hypothetical example may help to clarify the mechanism just described. We'll christen our batters "Babe" and "Casey" and examine their averages over two consecutive imaginary seasons:
The extreme figures here are preposterous, of course, but they illustrate the effect of the weightings very clearly. So, while Simpson's paradox may be only a curiosity of passing interest, it does serve to remind us that sometimes things are not what they seem to be. Such reminders are especially meaningful to interpreters of data of whatever kind. If we are occasionally too rigid or complacent in our statistical dealings, we might see behind some page of figures the faint visage of Simpson gazing back at us from the mathematical twilight zone. It is easy to imagine him giving a sly wink to Messrs. Griffey, Dawson and McCatty, to the Cardinals and the Reds, and ultimately to all of us. |