PECOTA
{{short description|Sabermetric system for forecasting baseball player performance}}
PECOTA, an acronym for Player Empirical Comparison and Optimization Test Algorithm,{{Cite web|url=http://www.baseballprospectus.com/glossary/index.php?search=PECOTA|title=Baseball Prospectus {{!}} Glossary|website=www.baseballprospectus.com|access-date=2016-05-05}} is a sabermetric system for forecasting Major League Baseball player performance. The word is a backronym based on the name of journeyman major league player Bill Pecota, who, with a lifetime batting average of .249, is perhaps representative of the typical PECOTA entry. PECOTA was developed by Nate Silver in 2002–2003 and introduced to the public in the book Baseball Prospectus 2003.Nate Silver, "Introducing PECOTA," in Gary Huckabay, Chris Kahrl, Dave Pease et al., Eds., Baseball Prospectus 2003 (Dulles, VA: Brassey's Publishers, 2003): 507–514. Baseball Prospectus (BP) has owned PECOTA since 2003; Silver managed PECOTA from 2003 to 2009. Beginning in Spring 2009, BP assumed responsibility for producing the annual forecasts, making 2010 the first baseball season for which Silver played no role in producing PECOTA projections.Nate Silver and Kevin Goldstein, "State of the Prospectus: Spring 2009," [http://baseballprospectus.com/article.php?articleid=8653 BaseballProspectus.com, March 24, 2009] {{webarchive|url=https://web.archive.org/web/20090327063757/http://www.baseballprospectus.com/article.php?articleid=8653 |date=March 27, 2009 }}.
One of several widely publicized statistical systems of forecasts of player performance, PECOTA player forecasts are marketed by BP as a fantasy baseball product. Since 2003, annual PECOTA forecasts have been published both in the Baseball Prospectus annual books and, in more detailed form, on the BaseballProspectus.com subscription-based website.Illustrative PECOTA estimates and "cards" are available for inspection by nonsubscribers here: http://www.baseballprospectus.com/pecota/. PECOTA also inspired some analogous projection systems for other professional sports: KUBIAK for the National Football League, SCHOENEKevin Pelton, "Introducing SCHOENE: Our NBA Projection System," [http://www.basketballprospectus.com/article.php?articleid=726 BasketballProspectus.com (October 20, 2008)] and CARMELO{{cite web|url=https://fivethirtyeight.com/features/how-were-predicting-nba-player-career/ |title= We're Predicting The Career Of Every NBA Player. Here's How |first=Nate |last=Silver |authorlink=Nate Silver |publisher=FiveThirtyEight |date=October 9, 2015 |access-date=February 29, 2016}} for the National Basketball Association, and VUKOTAThomas Awad, "Introducing VUKOTA," [http://www.puckprospectus.com/unfiltered/?p=68 PuckProspectus.com (July 20, 2009)]. for the National Hockey League.
PECOTA forecasts a player's performance in all of the major categories used in typical fantasy baseball games; it also forecasts production in advanced sabermetric categories developed by Baseball Prospectus (e.g., VORP and EqA). In addition, PECOTA forecasts several summary diagnostics such as breakout rates, improve rates, and attrition rates, as well as the market values of the players. The logic and methodology underlying PECOTA have been described in several publications, but the detailed formulas are proprietary and have not been shared with the baseball research community.{{Citation needed|date=August 2022}}
Methodology
Silver described the inspiration for his approach as follows:
The basic idea behind PECOTA is really a fusion of two different things –[Bill] James's work on similarity scores and Gary Huckabay's work on Vlad, [Baseball Prospectus's] previous projection system, which tried to assign players to a number of different career paths.Gary Huckabay, "6–4–3: Reasonable Person Standard," [http://www.baseballprospectus.com/article.php?articleid=1581 BaseballProspectus.com, August 2, 2002]. I think Gary used something like thirteen or fifteen separate career paths, and all that PECOTA is really doing is carrying that to the logical extreme, where there is essentially a separate career path for every player in major league history. The comparability scores are the mechanism by which it picks and chooses from among those career paths.Rich Lederer, "An Unfiltered Interview with Nate Silver," [http://baseballanalysts.com/archives/2007/02/an_unfiltered_i.php Baseball Analysts, February 12, 2007.]
=Comparable players=
PECOTA relies on fitting a given player's past performance statistics to the performance of "comparable" Major League ballplayers by means of similarity scores. As is described in the Baseball Prospectus website's glossary:{{Cite web|url=https://legacy.baseballprospectus.com/glossary/index.php?mode=viewstat&stat=38|title=Baseball Prospectus | Glossary|website=legacy.baseballprospectus.com}}
PECOTA compares each player against a database of roughly 20,000 major league batter seasons since World War II. In addition, it also draws upon a database of roughly 15,000 translated minor league seasons (1997–2006) for players that spent most of their previous season in the minor leagues. ... PECOTA considers four broad categories of attributes in determining a player's comparability:{{Citation needed|date=August 2022}}
1. Production metrics – such as batting average, isolated power, and unintentional walk rate for hitters, or strikeout rate and groundball rate for pitchers.
2. Usage metrics, including career length and plate appearances or innings pitched.
3. Phenotypic attributes, including handedness, height, weight, career length (for major leaguers), and minor league level (for prospects).
4. Fielding Position (for hitters) or starting/relief role (for pitchers). ... In most cases, the database is large enough to provide a meaningfully large set of appropriate comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached.
PECOTA uses nearest neighbor analysis to match the individual player with a set of other players who are most similar to him. Although drawing on the underlying concept of Bill James' similarity scores, PECOTA calculates these scores in a distinct way that leads to a very different set of "comparables" than James' method.This difference is explained and illustrated in Nate Silver, "Introducing PECOTA," Baseball Prospectus 2003, cited above. Furthermore, Silver describes the following distinct feature:
The PECOTA similarity scores are based primarily on looking at a three-year window of a pitcher’s performance. Thus, we might look at what a pitcher did from ages 35–37, and compare that against the most similar age 35–37 performances, after adjusting for parks, league effects, and a whole host of other things. This is different from the similarity scores you might see at [https://www.baseball-reference.com/ baseball-reference.com] or in other places, which attempt to evaluate the totality of a player’s career up to a given age.http://www.baseballprospectus.com/unfiltered/?p=136. Also see Baseball Prospectus' glossary entry for [http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=38 "Comparable Players"].
Once a set of "comparables" is determined for each player, his future performance forecast is based on the historical performance of his "comparables". For example, a 26-year-old's forecast performance in the coming season will be based on how the most comparable Major League 26-year-olds performed in their subsequent season.
Separate sets of predictions are developed for hitters and pitchers.
=Peripheral statistics=
PECOTA also relies a lot on the use of peripheral statistics to forecast a given player's future performance. For example, drawing on the insights coming out of the use of defense-independent pitching statistics, PECOTA forecasts a pitcher's future performance in a given area by using information about his past performance in other areas.See PERA for an example of the use of peripheral statistics to estimate a performance. As baseball analyst and journalist Alan Schwarz writes, "Silver ... designed a sophisticated variance algorithm that has examined every big-league pitcher's statistics since 1946 to determine which numbers best forecast effectiveness, specifically earned run average. His findings are counterintuitive to most fans. 'When you try to predict future E.R.A.'s with past E.R.A.'s, you're making a mistake,' Silver said. Silver found that the most predictive statistics, by a considerable margin, are a pitcher's strikeout rate and walk rate. Home runs allowed, lefty-righty breakdowns and other data tell less about a pitcher's future".Alan Schwarz, "Numbers Suggest Mets Are Gambling on Zambrano," New York Times, August 22, 2004.
=Probability distributions=
Instead of focusing on making point estimates of a player's future performance (such as batting average, home runs, and strike-outs), PECOTA relies on the historical performance of the given player's "comparables" to produce a probability distribution of the given player's predicted performance during the next five years. Alan Schwarz has emphasized this feature of PECOTA: "What separates Pecota from the gaggle of projection systems that outsiders have developed over many decades is how it recognizes, even flaunts, the uncertainty of predicting a player's skills. Rather than generate one line of expected statistics, Pecota presents seven – some optimistic, some pessimistic – each with its own confidence level. The system greatly resembles the forecasting of hurricane paths: players can go in many directions, so preparing for just one is foolish".Alan Schwarz, "Predicting Futures in Baseball, and the Downside of Damon," New York Times, November 13, 2005. Silver has written,
This procedure requires us to become comfortable with probabilistic thinking. While a majority of players of a certain type may progress a certain way – say, peak early – there will always be exceptions. Moreover, the comparable players may not always perform in accordance with their true level of ability. They will sometimes appear to exceed it in any given season, and other times fall short, because of the sample size problems that we described earlier.
PECOTA accounts for these sorts of factors by creating not a single forecast point, as other systems do, but rather a range of possible outcomes that the player could expect to achieve at different levels of probability. Instead of telling you that it's going to rain, we tell you that there's an 80% chance of rain, because 80% of the time that these atmospheric conditions have emerged on Tuesday, it has rained on Wednesday.{{Citation needed|date=August 2022}}
Surely, this approach is more complicated than the standard method of applying an age adjustment based on the 'average' course of development of all players throughout history. However, it is also leaps and bounds more representative of reality, and more accurate to boot.Nate Silver, "Baseball Prospectus Basics: The Science of Forecasting," [http://www.baseballprospectus.com/article.php?articleid=2659 BaseballProspectus.com, March 11, 2004].
=Team effort=
Although Silver was the creator of PECOTA, producing PECOTA forecasts was a team effort: "I might be 'the PECOTA guy,' but it very much is a team effort," Silver has said of the BP staff. "We all do it. It's my baby, but it takes a village to run a PECOTA".William Hageman, "Baseball By the Numbers," Chicago Tribune, January 4, 2006. For example, PECOTA draws on Clay Davenport's translations (the so-called Davenport Translations or DT's) of minor league and international baseball statistics to estimate the major league equivalent performance of each player.See Clay Davenport, "DT's vs. MLEs — A Validation Study," [http://baseballprospectus.com/article.php?articleid=49 BaseballProspectus.com, January 30, 1998]; Clay Davenport, "Winter and Fall League Translations: Just How Good Are These Leagues, Anyway?," [http://baseballprospectus.com/article.php?articleid=2528 BaseballProspectus.com, January 27, 2004]; and Clay Davenport, "Over There! A Second Review of Translating Japanese Statistics, and Translating the Mexican League," Baseball Prospectus 2004 (New York: Workman, 2004): 585–590. In this way, PECOTA is able to make projections for more than 1,600 players each year, including many players with little or no prior major league experience.
The 2009 preseason forecasts were the last ones for which Silver took primary responsibility. In March 2009, Silver announced that PECOTA's extremely complex and laborious set of database manipulations and calculations would be moving to a different platform. Although Baseball Prospectus had been the owner of PECOTA since Silver sold it to them in 2003 – and Silver stewarded and took responsibility for the forecasts – henceforth PECOTA forecasts would be generated by the Baseball Prospectus team, initially with Clay Davenport in charge of the effort,See, for example, Clay Davenport, "Depth Charts," [http://baseballprospectus.com/unfiltered/?p=1264 BaseballProspectus.com, May 13, 2009]. and later, through the 2013 season, with Colin Wyers heading up both production and improvements in PECOTA.
Alternative forecasting systems
Most of the other popular forecasting systems do not use a "comparable players" approach. Instead most rely on direct projections from a player's past performance to his future performance, typically by using as a baseline a weighted average of a player's performance in his previous three years. Like PECOTA, many of those systems also adjust the projections for aging, park effects and regression toward the mean. Like PECOTA, they may also adjust for the competitive difficulty of each of the two major leagues.PECOTA's aging adjustment is implicit in the path of "future" performance of the set of historical "comparable" players. The systems differ from one another, however, in the types and intensities of age adjustments, regression-effect estimates, park adjustments, and league-difficulty adjustments that they may make as well as in whether they use similarity scores.Among the current major alternative statistically based projection systems are Tom Tango's Marcel projections (available and documented for 2007 at [http://www.hardballtimes.com/main/article/weve-got-the-2007-marcels/ The Hardball Times]); Diamond Mind Baseball, also described in an [https://www.espn.com/mlb/preview07/news/story?id=2820932 ESPN.com article] on 2007 team projections; Ron Shandler's [http://www.baseballhq.com/ Baseball HQ] and his annual book, Baseball Forecaster; The Hardball Times pre-season forecasts, inaugurated with the 2007 season; [http://lanaheimangelfan.blogspot.com/ Chone Smith's] "Chone Projections," reported on the website of [http://fangraphs.com Fangraphs.com]; [http://www.baseballinfosolutions.com/bjh_07_regular.html Baseball Info Solutions – BIS]; and Dan Szymborski's [http://www.baseballthinkfactory.org/files/oracle/discussion/2007_zips_projection_disk_build_2/ "ZiPS"] Projections. For a list of well-known forecasting systems as of 2014, including "Steamer," see this summary by Fangraphs: [http://www.fangraphs.com/library/principles/projections/ "Projection Systems"]. PECOTA also makes projections for many more players than do other systems, because PECOTA relies on adjusted minor league statistics as well as major league statistics and tries to make projections for all of the players on major league expanded rosters (40 players per team) as well as other prospects.{{Citation needed|date=August 2022}}
Beginning in 2000, the Cleveland Indians developed a proprietary analytical database called DiamondView to evaluate scouting information gathered by the team; this system later incorporated player performance indicators and financial indicators, for purposes of evaluating and projecting the performance of all major league players.{{Cite web|url=http://www.cleveland.com/gameplan/index.ssf?/gameplan/more/part2.html|archiveurl=https://web.archive.org/web/20071210123924/http://www.cleveland.com/gameplan/index.ssf?%2Fgameplan%2Fmore%2Fpart2.html|url-status=dead|title=cleveland.com: The Game Plan|archivedate=December 10, 2007}} During 2008–2009, the Pittsburgh Pirates were in process of developing MITT ("Managing, Information, Tools and Talent"), a proprietary database that integrates scouting reports, medical and contract information, and performance statistics and projections.Pat Mitsch, "Pirates are Hoping 'The MITT' Catches On," [http://www.pittsburghlive.com/x/pittsburghtrib/sports/pirates/s_634324.html Pittsburgh Tribune-Review, July 19, 2009] {{Webarchive|url=https://web.archive.org/web/20090722030053/http://www.pittsburghlive.com/x/pittsburghtrib/sports/pirates/s_634324.html |date=July 22, 2009 }}.
Updates and revisions
First introduced in 2003,Nate Silver, "Introducing PECOTA," Baseball Prospectus 2003, cited previously. PECOTA projections are produced each year and published both in the Baseball Prospectus annual monographs and on the BaseballProspectus.com website. PECOTA has undergone several improvements since 2003. The 2006 version introduced metrics for the market valuation of players based on the predicted performance levels. The 2007 version introduced adjustments for league effects, to account for differences in the competitive environment of the two major leagues."Baseball Prospectus Chat: Nate Silver," [http://baseballprospectus.com/chat/chat.php?chatId=253 BaseballProspectus.com, January 19, 2007]. The 2008 update took into account differences in players' performance during the first and second halves of the previous season as well as platoon splits (how well a player performed against hitters or pitchers who were left- or right-handed).Steven Goldman and Christina Kahrl, Eds., Baseball Prospectus 2008 (New York: Plume, 2008), pp. viii–ix. It also took account of baserunning.Nate Silver, "Is Baserunning a Skill?" [http://www.baseballprospectus.com/unfiltered/?p=683 BaseballProspectus.com, November 29, 2007]. In 2009, Baseball Prospectus introduced in-season PECOTA projections, to update and supplement its beginning of the season projections.Eric Seidman, "In-Season PECOTAs," [http://www.baseballprospectus.com/unfiltered/?p=1348 BaseballProspectus.com, July 23, 2009]. In 2012, PECOTA substantially changed the way it weighed past years' performance in establishing the baseline for projections.Colin Wyers, "Reintroducing PECOTA: The Weighting is the Hardest Part," [http://www.baseballprospectus.com/article.php?articleid=15992 BaseballProspectus.com, February 8, 2012]. In addition, 10-year forecasts and percentile forecasts were added to the individual player PECOTA cards that are published on-line.Colin Wyers, "Reintroducing PECOTA," [http://www.baseballprospectus.com/article.php?articleid=16189 BaseballProspectus.com, March 12, 2012].
Accuracy
Although Baseball Prospectus promotes PECOTA commercially as "deadly accurate," all projection systems are subject to considerable uncertainty. A comparison found that PECOTA had outperformed several other forecasting systems for the 2006 season in predicting OPS. It performed nearly as well as the best of the other systems in predicting ERA.Chone Smith http://lanaheimangelfan.blogspot.com/2006/12/pecota.html Although PECOTA projections are made for well over 1000 hitters each season, the evaluation of the system included only slightly over 100 players who had a minimum of 500 major league AB and had also been included in projections by the other systems. Nate Silver's own comparison of the performance of alternative projection systems for hitters in 2007 also showed that PECOTA led the field, though a couple of others were close.Nate Silver, "2007 Hitter Projection Roundup," [http://www.baseballprospectus.com/unfiltered/?p=564#34764 BaseballProspectus.com (October 4, 2007)].
Although designed primarily for predicting individual player performance, PECOTA has been applied also to predicting team performance. For this purpose, projected team depth charts are established with projected playing times for each team member, drawing on the expert advice of the Baseball Prospectus staff. The number of runs a team will score and allow during the coming season is estimated based on the playing times and PECOTA's predicted individual performance of each player, using a "Marginal Lineup Value" algorithm created by David Tate and further developed by Keith Woolner.Keith Woolner, "Marginal Lineup Value," [http://www.stathead.com/bbeng/woolner/mlvdesc.htm StatHead.com]. A team's expected wins is based on applying an improved version of Bill James' Pythagorean Formula to the estimated number of runs scored and allowed by the roster of players under the given playing-time assumptions.On the Pythagenport formula, see Clay Davenport and Keith Woolner, "Revisiting the Pythagorean Theorem: Putting Bill James' Pythagorean Theorem To the Test," [http://www.baseballprospectus.com/article.php?articleid=342 BaseballProspectus.com, June 30, 1999] as well as the Baseball Prospectus glossary entry for "Pythagenport"[http://baseballprospectus.com/glossary/index.php?mode=viewstat&stat=136]. On the construction of the depth charts for each team and the application of PECOTA to estimating team wins, see Nate Silver, "PECOTA Projects the American League," [http://www.baseballprospectus.com/article.php?articleid=3836 BaseballProspectus.com, March 21, 2005]; and Nate Silver, "PECOTA Breaks Hearts," [http://baseballprospectus.com/article.php?articleid=4917 BaseballProspectus.com, March 29, 2006].
PECOTA has been used in preseason forecastse.g., Clay Davenport, "PECOTA Projected Standings: Pegging the 2009 Season," [http://baseballprospectus.com/article.php?articleid=8528 BaseballProspectus.com, February 19, 2009]. of how many wins teams will attain and in mid-season simulations of the number of wins each team will attain and its odds of reaching the playoffs.See Clay Davenport, "Playoff Odds Report: The Addition of PECOTA," [http://www.baseballprospectus.com/article.php?articleid=5036 BaseballProspectus.com, May 3, 2006] and [http://www.baseballprospectus.com/statistics/ps_oddspec.php Baseball Prospectus Statistics]. In 2006, PECOTA's preseason forecasts compared favorably to other forecasting systems (including Las Vegas betting line odds) in predicting the number of wins teams would earn during the season.Nate Silver, "Projection Reflection," [http://www.baseballprospectus.com/article.php?articleid=5609 BaseballProspectus.com, October 11, 2006]. An independent evaluation by the website Vegas Watch showed that PECOTA had the lowest error in predicting Major League team wins in 2008 of all the best known forecasts, both those that were sabermetrically based and those that relied on individual expertise."Evaluating April MLB Predictions (2008)," VegasWatch.net, [http://vegaswatch.net/2008/09/evaluating-april-mlb-predictions-2008.html September 21, 2008] and [http://vegaswatch.net/2008/09/evaluating-april-mlb-predictions-update.html September 28, 2008]. In 2009, however, PECOTA lagged behind all the well-known forecasters."Evaluating April MLB Predictions (2009)," [http://vegaswatch.net/2009/09/evaluating-april-mlb-predictions-2009.html VegasWatch, September 28, 2009].
A summary for the 2003 through 2007 seasons shows that PECOTA's average error between the predicted and actual team wins declined:Nate Silver, "Braves, Angels Have Most Heart," [http://baseballprospectus.com/unfiltered/?p=792 BaseballProspectus.com, March 10, 2007].
2003 5.91 wins;
2004 7.71 wins;
2005 5.14 wins;
2006 4.94 wins;
2007 4.31 wins. Silver conjectures that the improvement has come in part from taking defense into account in the forecasts beginning in 2005.
In 2008 the average error was 8.5 wins."Taking the Over on PECOTA," VegasWatch.net, [https://web.archive.org/web/20120831022300/http://vegaswatch.net/2009/02/taking-over-on-pecota.html February 8, 2009].
References
{{reflist}}
Sources
{{refbegin|2}}
- Jonah Keri, "'Tis the Season to Project Stats," [https://www.espn.com/espn/page2/story?page=keri/070214 ESPN.com, February 14, 2007].
- Rich Lederer, "An Unfiltered Interview with Nate Silver," [http://baseballanalysts.com/archives/2007/02/an_unfiltered_i.php BaseballAnalysts.com, February 12, 2007].
- Alan Schwarz, "Numbers Suggest Mets Are Gambling on Zambrano," [https://query.nytimes.com/gst/fullpage.html?res=9A05E3DC173EF931A1575BC0A9629C8B63&n=Top/Reference/Times%20Topics/People/S/Schwarz,%20Alan New York Times, August 22, 2004].
- Nate Silver, "The Science of Forecasting," [http://www.baseballprospectus.com/article.php?articleid=2659 BaseballProspectus.com, March 11, 2004].
- Nate Silver, "Introducing PECOTA," Baseball Prospectus 2003 (Dulles, VA: Brassey's Publishers, 2003): 507–514.
- Nate Silver, "PECOTA Takes on the Field: How'd It Fare Against Six Other Projections Systems?" [http://baseballprospectus.com/article.php?articleid=2515 BaseballProspectus.com, January 16, 2004].
- Nate Silver, "PECOTA 2004: A Look Back and a Look Ahead," Baseball Prospectus 2004 (New York: Workman Publishers, 2004): 5–10.
- Nate Silver, "Rearranging PECOTA," Baseball Prospectus 2006 (New York: Workman Publishers, 2006): 6–11.
- Nate Silver, "Why Was Kevin Maas a Bust?" Baseball Between the Numbers, Jonah Keri, Ed. (New York: Basic Books, 2006): 253–271.
- Dave van Dyke, "Predictions: Ignore Them at Your Peril," [http://www.chicagotribune.com/news/chi-09-white-sox-chicago-spring-mar09,0,7769221.story Chicago Tribune, March 9, 2008].
- Childs Walker, "Baseball Prospectus Makes Predicting Future Thing of Past," Baltimore Sun, February 21, 2006.
{{refend}}