Saturday, February 7, 2009

What is Sabermetrics?

Mike Lovell - talkin'Giants'baseball
The Giants management team now under the leadership of Bill Neukom has adopted Sabermetrics as a player assessment tool. This is an excellent article that explains Sabermetrics.

Microsoft influence?

A thought...wouldn't it be interesting if Neukom used his relationship and have the Giants join the Microsoft conglomerate to infuse money into the club's operation. Certainly, in theory, the Giants would have income to compete with the "big market" teams like the Yankees, Mets, Red Sox, Cubs, Angels and the hated Dodgers.



Stephen Tomlinson http://www.stephent.com
What is Sabermetrics?
"Baseball men have not yet reached the revelation of Sir Francis Bacon, which was in essence that since all men live in darkness, who believes something is not a test of whether it is true or false. I have spent years trying to get people to ask simple questions: What is the evidence, and what does it mean?"

Bill James, This Time Let's Not Eat the Bones, p. 434

In his last Baseball Abstract (1988), Bill James defined "Sabermetrics" as "the search for objective knowledge about baseball". Saber comes from SABR, the acronym for the Society for American Baseball Research. Some people spell Sabermetrics as Sabrmetrics.

Examples of sabermetric questions: How much does playing in Colorado help a hitter? Could Alex Rodriguez' amazing rookie season have been predicted from his minor league stats? Are players as good at ages 30-34 as they are at ages 25-29?

Below I outline the sabermetric findings that I think are most likely to help major league general managers, but don't seem to be known to a lot of them, including the Jays' Gord Ash.

Runs Produced Formulas

There are formulas for estimating the number of runs produced by a hitter, and they heavily depend on On-Base Percentage (OBP) and Slugging Percentage (SLG). Batting Average does not correlate as well with run production, and RBIs and Runs Scored can be misleading because they depend on how good the players around you in the lineup are. A simple formula that works well is

  Runs = OBP * SLG * AB

or, almost equivalently

  Runs = ( Hits + Walks ) * ( Total Bases ) / ( At Bats + Walks )
though I list more accurate formulas on a separate page. The reason we believe these formulas are accurate is because when they are applied to any major league team's stats, they come very close to the actual number of runs scored by that team.

If you divide a player's runs produced by the number of outs he made (e.g. take At Bats minus Hits), and multiply by 27 (the number of outs in a game), you can estimate the number of Runs Per Game (RPG) a lineup of that player would produce.

Easy stuff, right? But look at how the Jays evaluated their talent last year (1996). John Olerud and Shawn Green were publicly criticized, and in particular the Jays' paid $5 million to the Mets to take John Olerud off their hands (though picked up Robert Person in return). Meanwhile, Joe Carter, praised for his ability to drive in runs, was rewarded with a one-year contract extension and given Olerud's first-base job. Now look at how these players actually performed last year:

          Age      BA  OBP  SLG   RPG ERP RUN RBI   AB  BB  SB  HR   PA   $m
Olerud 27 1B .274 .369 .472 6.19 69 59 61 398 60 1 18 458 6.5
Green 23 RF .280 .332 .448 5.20 61 52 45 422 33 5 11 455 .3
Carter 36 LF .253 .302 .475 4.91 89 84 107 625 44 7 30 669 6.5

The "RPG" number is the Estimated Runs Produced (ERP) per 27 outs: a lineup of Olerud's works out to 6.19 runs per game, a lineup of Green's to 5.20 runs per game, and a lineup of Carter's to just 4.91 runs per game. The American League average last year was 5.39 runs per game. In other words, Olerud was an above-average hitter (though he didn't play a lot against left-handers), Green was about average, and Carter was below-average. (Don't think 107 RBI means 107 runs produced; for example, if you hit a fly ball and a runner scores on the sacrifice fly, surely the hitter shouldn't get more than 1/4 of the credit for the run, and surely you shouldn't evaluate a hitter more highly just because he happens to get more chances to drive in runs than others). Now, there's more to evaluating a player than his estimated runs produced per 27 outs, but if these numbers were posted on the scoreboard every day, i.e. if the Jays understood the value of Olerud's extra walks and consequences of Carter's extra outs, do you think Carter would have been kept instead of Olerud? Surely not.

Note: All of the '96 Jays hitting stats, including the meaning of all the categories listed above, are provided on a separate page.

Attention general managers: always check the runs produced formula as part of your evaluation of a ball player.

Park Factors

You hear so often that Colorado has a great hitting team. Of course, people seem to know that Colorado's elevation helps the hitters, but then they still say Colorado has a great hitting team because they score so many more runs than everyone else.

Over the past two years at Coors Field (1995-96), teams have averaged 7.16 runs per game (that's each team; 14.3 runs per game have been scored in total). In Rockies' road games, 4.25 runs have been scored per team per game. So Coors' Field appears to inflate scoring by an incredible 68%. So the Rockies' hitters, who hit there half the time, have runs produced stats inflated by about 34% (actually, since other teams get to play there 1/28th the time, you could argue Rockies' stats are only inflated 32%, so below I will divide by 1.32).

32% is a big difference. A 100 RBI season, normally considered impressive, is really only a 75 RBI season. Batting averages are inflated by 10-15%. A .300 batting average for a Rockies' player is like .270 for someone else (Larry Walker's current .400 is like .360 somewhere else; still impressive, but .360 has been done lots of times in the past few decades, unlike .400).

At the '97 All-Star Break, the Rockies led the National League with roughly 5.8 runs scored per game. Assuming they've played half their games at Coors and the park factor has not changed, that's only like scoring 4.4 runs per game, which means the Rockies have only an average hitting team, ranking 6th or 7th in the National League. So why do so many people say the Rockies have a great hitting team? Does Rockies' management know they need to improve their hitting if they want to get back to the playoffs?

Coors Field is the most extreme case. The next biggest park factor is Dodger Stadium's, which reduces scoring by roughly 28% (so multiply Dodger players' estimated runs produced by 1.14 to correct for this). SkyDome turns out to be a neutral American League park (some people say SkyDome is a hitters' park, but then so are most American League parks). My park data numbers are taken from the 1997 Stats Major League Handbook, p. 297.

Attention general managers: always, always, always check the park factor before evaluating yours or other teams' talent.

Interpeting Minor League Stats

Minor league stats are as useful as major league stats for predicting future performance. Of course, you have to translate the minor league stats according to park factor and for the tougher competition in the major leagues. But once you do this you can easily see whether the guy hitting well at Triple-A is hitting as well or better than the (typically more expensive) guy on the major league team.

"In my opinion, this is the most important thing that I learned in my years of studying sabermetrics in terms of its potential ability to help a baseball team."

Bill James, This Time Let's Not Eat the Bones, p. 475

Do the Jays have any minor leaguers whose stats are better than their major league counterparts? They sure do. Centre-fielder Shannon Stewart's '96 stats translate to 6.0 runs per game, better than any Jays' outfielder of a year ago. Third-baseman Tom Evans' '96 stats translate to 5.9 runs per game, compared to 5.4 runs per game for Ed Sprague. Second-baseman Jeff Patzke's translation of 4.7 runs per game is close to Carlos Garcia's 4.9 of a year ago. Do the Jays know that they could probably have improved their offense and saved millions of dollars with a few callups? All of these guys are young and are hitting well in Syracuse this year. Meanwhile, Joe Carter, Ed Sprague and Carlos Garcia are making a collective $10.8 million and are having subpar years as of the '97 All-Star Break. The good news is that the Jays have not traded away their top hitting prospects. One suspects if the Jays had the Expos' budget, they'd be playing these guys in the majors right now, and like the Expos would have a winning team.

Note: my translations were taken from the 1997 Baseball Prospectus. It provides a park-adjusted, league-difficulty adjusted number called "equivalent average" for all major league and many minor league players, which I translated to a "Jays' runs per game" number by squaring and multiplying by 79. (By the way, Alex Rodriguez' amazing '96 stats matched almost exactly the translations of his '95 season in Tacoma.) Bill James' detailed article on translating minor league stats is in his 1985 Baseball Abstract; he figured players lose about 18% of their offensive ability when moving from Triple-A to the majors.

Attention general managers: always check what you have in your minor league system, or what you can acquire from someone else's system, before investing in "proven" players.

Age

On average, players tend to improve until age 27, and decline after age 27.

Bill James' detailed study is in his 1982 Baseball Abstract (which I don't have) but he summarizes the findings in This Time Let's Not Eat the Bones, p. 460:

  • "Almost every accomplishment (twenty-win seasons, hundred-RBI seasons, . . . ) is more common at age 27 than any other age." (Pat Hentgen won his Cy Young award at age 27 last year.)
  • "The peak period for ballplayers is not twenty-eight to thirty-two, as was once believed, but twenty-five to twenty-nine."
  • "All players as a group retain 77 percent of their peak value at the age of thirty, and barely over one-half of their peak value (53 percent) at the age of thirty-two." (I'm not sure what his measure of "value" is here, but I'm guessing it's relative to the typical Triple-A player). There's a note on "important differences" for "superstars" because their major league careers are so much longer, but no details are given in this summary.
  • "Contrary to popular belief, power pitchers age more slowly and last much longer than do 'finesse' or 'control' type pitchers."

The age 27 finding applied to all groups studied except knuckleball pitchers and players "specifically selected because they had their best years at some other age." No one is claiming that every player steadily improves to age 27 and steadily declines thereafter; in fact, almost none fit the pattern that precisely. These are averages over groups of players.

The Jays for the '97 season added Dan Plesac (age 35), Roger Clemens (age 34), Benito Santiago (age 32), Orlando Merced (age 30) and Carlos Garcia (age 29, he claims). They also extended the contract of Joe Carter (age 37) and tried Ruben Sierra (age 31) when they discovered they had hitting problems. Last year, and for a couple months this year, they gave significant playing time to Jacob Brumfield (age 32) and Juan Samuel (age 36). Also last year, Otis Nixon (now age 38) and Erik Hanson (age 32), were given multi-year contracts. Overall, the Jays' investment in older players has had supposedly "disappointing" results, but really these results are the kind that should have been expected based on the performance-age pattern found by James. It should not be surprising that two of the Jays' three above-average hitters have turned out to be Carlos Delgado (age 25) and Shawn Green (age 24), and that the dumped John Olerud (age 28) is having a great year with the Mets.

It's not hard to see how a poor team, like the Expos, can have "surprising success" every year, while a rich team, like the Jays, or the Yankees of the 80's, can have "disappointing" results every year. The poor team has almost no choice but to invest in "unproven" players, but in fact their minor league numbers do prove something, and because they are young, they normally improve for a few years. The rich team is tempted to invest in "proven" free-agents, but because they are over 30, they normally don't perform as well as in the glory years associated with them. Of course, a "smart" rich team, like the Jays under Pat Gillick, has an advantage over the poor teams because it won't have to let go of a good young player still in his prime. The good news for Jays fans is that the Jays have at least been smart enough to keep most of their good young players (not counting Olerud and Alomar), especially their good minor leaguers, so the Jays are in position to have a winning team essentially as soon as they are ready to stop playing the under-performing veterans.

Attention general managers: always check on which side of 27 a player's age is when evaluating the player.

Pythagorean Formula

You can predict a team's winning percentage by taking its runs scored squared and dividing by the sum of its runs scored squared and its runs allowed squared (got that?). This is known as the Pythagorean Formula:

   WPCT = RF^2 / ( RF^2 + RA^2 )

I've read that 1.83 has been found to be a slightly better exponent to use than 2, but it doesn't matter much. For example, last year (1996) the Jays scored 766 runs and allowed 809, for which the Pythagorean Formula predicts a .473 winning percentage, or a 77-85 record (the Jays actually played .457 ball, for a 74-88 record, not too far off their projection).

This formula provides the link between run production (discussed in detail above) and wins (which is the point, after all). For example, if we want to figure how many wins the Jays lost by keeping Carter instead of Olerud, we'd figure the number of extra runs that Olerud would have produced (23 runs at the '97 All-Star Break in almost the same number of plate appearances, and this doesn't take into account that Shea Stadium is a bit of a pitchers' park), and then compare the Pythagorean projections for the Jays with and without the extra 23 runs (.452 without, .487 with), which leads to a difference of 2.9 wins over 83 games, so probably 5-6 wins by the end of the year. Now, the formulas are imperfect tools that don't account for everything and a lot of assumptions are made when using them, so you will have to decide for yourself if the Jays finish 5 games out of a playoff spot, was keeping Carter instead of Olerud the reason?

Attention general managers: compute the estimated number of wins a player will add to your team before spending millions.

Bridging the Gap

"The evolution of statistical information about baseball, progressing nicely from 1869 to 1955, was frozen solid for a generation afterward."

Bill James, This Time Let's Not Eat the Bones, pp. 453-4

Why aren't the findings of sabermetrics applied more by baseball people? I remember back in the early-80's listening to Tony Kubek on TV talking about Bill James. (By the way, I think Kubek was the best regular Jays' analyst of all-time.) Kubek was commenting on a recent Bill James' article that asked something like "who is Alfredo Griffin to be keeping Tony Fernandez out of the big leagues?". Kubek wasn't impressed by all the numbers and liked to point out all the little things Griffin did to help his team. Of course, when Fernandez was called up, he immediately played as well as James predicted, and was a big part of the great Jays' teams of the mid-80's.

I think like Kubek, most baseball people have at least heard of the major sabermetric findings and first heard of them years ago. But I think a lot of them don't want to believe that the tools are that useful; that would mean their years of baseball "experience" aren't quite as valuable. And since the numbers obviously can't tell the entire story, they can always assume they know better. It's easy to get away with this because almost no one is challenging them on the point.

But that's why I think there could be a paradigm shift soon, where suddenly ordinary baseball broadcasters, writers, and players and managers, refer more to these findings. The reason I think so is the Internet and the World Wide Web. Because of the web, millions of "ordinary" fans are going to be exposed to sabermetric findings in the near future, and once enough of them see the obvious, that there are tools of real value here, they will pressure real baseball people to answer for their decisions that go against these findings. I wouldn't be surprised if teams soon have a resident "sabermetrician" advising them along with their scouts at their strategic planning meetings.

If this breakthrough for sabermetrics occurs, mediocre veteran players will be the big losers, talented young players will be the big winners, and the fans will see higher-quality play. Park-adjusted "equivalent averages" may be updated and displayed every night in place of batting averages. Of course, the poorer teams would suffer if the rich teams smartened up, but the good news is that they can still compete, and they can always try to grab a prospect from an impatient rich team now and then. To maintain equity, salary caps aren't needed, but it would be a retrograde step if the amateur draft was abolished or players could become free agents in less than six years.

References

  • The Bill James Baseball Abstract, annual publication from 1982-1988 of Ballantine Books. These books are still a great read years later. A lot more is covered than just evaluating players.
  • Bill James, This Time Let's Not Eat the Bones, 1989, Random House. Reprints a lot of articles from the abstracts, but with the numbers taken out. Probably the best ever one-book summary of sabermetric knowledge.
  • STATS Major League Handbook 1997, November 1996, STATS Publishing. Great annual reference book, has player projections, park data.
  • Gary Huckabay, Clay Davenport, Rany Jazayerli, Chris Kahrl, Joseph S. Sheehan, Baseball Prospectus, 1997 Edition, 1997, Ravenlock Media. Has commentary on every major leaguer and most minor leaguers of note. It has become my favorite reference this year.

No comments:

Powered By Blogger