SAFE: Spatial Aggregate Fielding Evaluation



   1. Motivation

   2. Grounder Methodology

   3. Fly/Liner Methodology

   4. Combining BIP types

   5. Results for Infielders

   6. Results for Outfielders

 Comments? Email:

   Shane T. Jensen


 Also Check Out:

   WhartonBall Blog


 Collaborators:

   Kenny Shirley
   Abraham Wyner
Picture not supported by browser

Motivation

One of the aspects of baseball that is the hardest to quantify and evaluate is fielding ability. Most events in baseball, such as hitting events, are discrete which makes them easy to tabulate and model probabilistically. The central difficulty with fielding is that we are trying to evaluate players on a continuous playing surface where we must take into account not just whether a successful play was made, but whether a successful play was possible. The much-maligned error statistic is a subjective attempt at discretising this phenomenon: players are assigned an error if the official scorer deems that their unsuccessful play should have been successful. However, tabulating errors isn't a good measure of ability without a corresponding measure that credits a player for making a play that most players wouldn't have.

Recent techniques such as Ultimate Zone Rating or the Plus-Minus system from The Fielding Bible (a must-read, by the way!) are based on the tabulation of both positive and negative fielding events. These statistics are more detailed and accurate measures of fielding ability. These techniques are also getting a bit more attention is the regular media, as evidenced by this recent Yahoo! Sports article. However, despite being obvious improvements on previous methods, both of these approaches are still based on dividing the baseball field into discrete zones and vectors, and tabulating events within each zone. Ideally, the baseball field could be treated as the continuous playing surface that it actually is, instead of a set of zones or vectors. Instead of tabulating fielding events within discrete zones, we fit continuous probability distributions to each fielder based on their past fielding events. The closest technique (at least in spirit) to our approach is the work by David Pinto.


Methodology for Grounder Balls-In-Play (g-bip)

Our raw data is from Baseball Info Solutions, which was also used for The Fielding Bible. For each grounder ball-in-play (g-bip), we have the (x,y) coordinates in the field where the g-bip was fielded, a "velocity" classification (ranging from 1-5) for the g-bip, as well as the number of outs made on the play. We defined any play where one out or more was made as "successful". Our evaluation procedure consisted of the following steps:

1. Estimating starting locations for each position

Our BIS data does not provide a key piece of information for each g-bip: the location of each fielder before the ball was hit. We estimate the starting location for each fielder as the (x,y) location in the field where each position has the highest overall probability of making a successful play. For each grounder, we then convert the bip coordinates into the angle at which the grounder was hit off of the bat. An angle of 0 corresponds to the 3rd base line while an angle of 90 corresponds to the 1st base line.

2. Fitting smooth models for the average fielder at each position

We model the probability of a successful play on a grounder as a smooth function of the angle between fielder location and the BIP path. We model different functions for each velocity category, and also allow a different function for fielders moving to the left or the right. These models are calculated using the data from all infielders, and so represent the ability of an aggregate fielder at each position. In the figure below, we show the probability model at each position for successful fielding of grounders with an intermediate velocity.

Picture not supported by browser

We see that each position has a distinct probability model. Note that pitchers seem to have a much larger range than the other infield positions only because they are much closer to home plate and therefore do not have to travel as much distance to cover the same range of angles from home plate.

3. Fitting player-specific models and calculating differences

We calculate the same probability models using only the data for each individual fielder and allowing different parameters for each player. Since we have different models for each individual player, we can quantify the difference between players by comparing their individual probabilities of making an out relative to the aggregate probability of making an out. As an example, the figure below illustrates the comparison on grounders between the aggregate model for the SS position and the individual models for the best and worst shortstops.

Picture not supported by browser

4. Weighted sum of player-specific differences

For each possible angle, we can calculate the difference D between particular fielder's probability of success and the aggregate probability of success. A rough measure of fielder ability is the sum over all possible distances of the difference (individual player - aggregate) in probability of not making a successful play. This sum is carried out by simple numerical integration. However, since not all distances occur with equal frequency, our SAFE measures are actually calculated as a frequency-weighted sum, so that more frequent distances or angles are more important. In addition, our sum is also weighted by the average run consequence of each angle, which allows us to take into account the different consequences of grounders to different areas eg. a missed grounder down the first base line leads to more bases than a missed grounder to the shortstop. Thus, for an individual player, their SAFE statistic can be interpreted as their expected runs cost/saved relative to the average fielder. A good fielder will have a large positive SAFE, which means a high number of runs saved, whereas a bad fielder will have a large negative SAFE, which means a high number of runs cost.


Methodology for Balls-In-Play in the air (a-bip)

Our raw data is from Baseball Info Solutions, which was also used for The Fielding Bible. For each ball-in-play hit into the air (a-bip), we have the (x,y) coordinates in the field where the a-bip was fielded, a "velocity" classification (ranging from 1-5) for the a-bip, as well as the number of outs made on the play. We defined any play where one out or more was made as "successful". Note that our balls-in-play into the air are subdivided into three different types: fly balls, liners, and pop ups. The following evaluation procedure is performed for each a-bip type separately:

1. Estimating starting locations for each position

Our BIS data does not provide a key piece of information for each a-bip: the location of each fielder before the ball was hit. We estimate the starting location for each fielder as the (x,y) location in the field where each position has the highest overall probability of making a successful play.

2. Fitting smooth models for the average fielder at each position

We model the probability of a successful play on a a-bip as a smooth function of the distance between the fielder starting location and the a-bip coordinates. We model different functions for each velocity category, and also allow a different function for fielders moving to the left or the right. These models are calculated using the data from all infielders, and so represent the ability of an aggregate fielder at each position.

3. Fitting player-specific models and calculating differences

We calculate the same probability models using only the data for each individual fielder and allowing different parameters for each player. Since we have different models for each individual player, we can quantify the difference between players by comparing their individual probabilities of making an out relative to the aggregate probability of making an out. As an example, the figure below illustrates the comparison on fly balls between the aggregate model for the CF position and the individual model for Darin Erstad in 2002

Picture not supported by browser

4. Weighted sum of player-specific differences

For each possible angle, we can calculate the difference D between particular fielder's probability of success and the aggregate probability of success. A rough measure of fielder ability is the sum over all possible (x,y) coordinates of the difference (individual player - aggregate) in probability of not making a successful play. This sum is carried out by simple numerical integration. However, since not all a-bip coordinates occur with equal frequency, our SAFE measures are actually calculated as a frequency-weighted sum, so that more frequent distances or angles are more important. In addition, our sum is also weighted by the average run consequence of each angle, which allows us to take into account the different consequences of a-bips to different areas eg. a missed a-bip into the outfield power alley has a higher consequence compared to a missed pop-up in shallow outfield. The figure below shows the major differences in consequences of a-bips to different areas of the field

Picture not supported by browser

Thus, for an individual player, their SAFE statistic can be interpreted as their expected runs cost/saved relative to the average fielder. A good fielder will have a large positive SAFE, which means a high number of runs saved, whereas a bad fielder will have a large negative SAFE, which means a high number of runs cost.


Combining SAFE across BIP types

We have described our methodology for calculating SAFE for grounders as well as balls hit into the air (which includes liners and fly balls). For each player in each season (2002-2005), their SAFE values within each ball-in-play type are added up over all appropriate ball-in-play types. For infielders, their combined SAFE values consists predominately of grounder balls-in-play (g-bip) but also include infield flys or liners. For outfielders, their combined SAFE values are aggregated across all ball-in-the-air types (fly balls and liners). These combined SAFE values are available in raw form at the link below:

Year-by-year SAFE values in Excel format: Version 03

UPDATE TO VERSION 03 (2008-10-01): The results have been updated to include 95% posterior intervals in addition to the posterior mean for each player-year. The old version 02 results can still be found here

UPDATE TO VERSION 02 (2007-12-06): The results have been updated using better models for individual fielder curves, BIP frequency, and shared consequence. The old version 01 results can still be found here


Results for Infielders

Below, we give the SAFE values for each infielder, averaged over the 2002-2005 seasons. Positive values indicate runs saved whereas negative values indicated runs cost. Within each position, fielders are ranked from best to worst. Only fielders for which we have enough data (at least 600 BIP faced) are included. These averages are weighted by the number of BIP faced by the player in each year, but keep in mind that some of the values below may be based on only one or two years worth of data. As mentioned above, the full year-by-year data is available here

First Baseman
Second Baseman
Third Baseman
Shortstop
1B Ken Harvey 2.85 2B Craig Counsell 10.97 3B Damian Rolls 7.75 SS Clint Barmes 11.63
1B Doug Mientkiewicz 1.92 2B Brandon Phillips 8.72 3B Craig Counsell 6.79 SS Alex Rodriguez 10.40
1B Eric Karros 1.75 2B Chase Utley 8.34 3B Placido Polanco 5.26 SS Jason Bartlett 9.37
1B Mike Sweeney 1.46 2B Orlando Hudson 6.67 3B David Bell 5.13 SS Adam Everett 8.72
1B Kevin Young 1.13 2B Rey Sanchez 5.51 3B Sean Burroughs 4.47 SS Craig Counsell 6.94
1B Scott Spiezio 1.09 2B Junior Spivey 5.41 3B Pedro Feliz 4.14 SS Bill Hall 6.07
1B Mark Teixeira 1.08 2B Bodhi Hart 4.77 3B Scott Rolen 3.14 SS Shane Halter 4.50
1B Nick Johnson 0.92 2B Nick Punto 4.62 3B Adrian Beltre 3.13 SS Rafael Furcal 4.03
1B Albert Pujols 0.80 2B Rickie Weeks 4.41 3B Hank Blalock 2.89 SS Chris Gomez 3.68
1B Ben Broussard 0.78 2B Keith Ginter 4.23 3B Chone Figgins 2.37 SS Jose Valentin 3.59
1B Julio Franco 0.70 2B Pokey Reese 3.91 3B Aaron Boone 2.30 SS James Hardy 2.86
1B Justin Morneau 0.69 2B Mark Ellis 3.55 3B Chad Tracy 1.84 SS Jack Wilson 2.79
1B Scott Hatteberg 0.61 2B Placido Polanco 3.44 3B Chipper Jones 1.83 SS Edgar Renteria 2.78
1B Darin Erstad 0.26 2B Adam Kennedy 2.88 3B Jared Sandberg 1.70 SS Julio Lugo 2.74
1B Travis Lee 0.24 2B Brian Roberts 2.81 3B Jose Hernandez 1.59 SS Juan Uribe 2.67
1B Phil Nevin 0.13 2B Jerry HairstonJr. 2.73 3B Chris Stynes 1.18 SS Mike Bordick 2.58
1B John Olerud -0.12 2B Tony Womack 2.60 3B Jose Valentin 1.15 SS Juan Castro 1.82
1B Paul Konerko -0.17 2B D'Angelo Jimenez 2.34 3B Jeff Cirillo 1.12 SS Royce Clayton 1.52
1B Todd Helton -0.42 2B Mark Grudzielanek 1.91 3B Corey Koskie 0.97 SS John McDonald 1.35
1B Carlos Pena -0.46 2B Michael Young 1.78 3B Morgan Ensberg 0.77 SS Miguel Tejada 1.19
1B Tino Martinez -0.59 2B Jeff Reboulet 1.51 3B Joe Crede 0.75 SS David Eckstein 0.98
1B Jeff Bagwell -0.65 2B Warren Morris 1.38 3B Bill Mueller 0.57 SS Orlando Cabrera 0.81
1B Kevin Millar -0.94 2B Fernando Vina 1.37 3B Robin Ventura 0.57 SS Bobby Crosby 0.77
1B Derrek Lee -0.98 2B Brent Butler 1.29 3B Alex Rodriguez 0.40 SS Cesar Izturis 0.43
1B Adam LaRoche -1.15 2B Ron Belliard 1.24 3B Vinny Castilla 0.31 SS Rey Ordonez 0.40
1B Sean Casey -1.29 2B John McDonald 1.19 3B Geoff Blum 0.17 SS Barry Larkin 0.33
1B Christopher Shelton -1.33 2B Marlon Anderson 0.92 3B Eric Chavez 0.15 SS Nomar Garciaparra 0.05
1B Jim Thome -1.38 2B Marcus Giles 0.81 3B Shea Hillenbrand -0.10 SS Alex Gonzalez -0.19
1B Jack Snow -1.40 2B Aaron Miles 0.07 3B Rob Mackowiak -0.32 SS Omar Infante -0.36
1B Robert Fick -1.45 2B Tadahito Iguchi 0.05 3B Brandon Inge -0.44 SS Kazuo Matsui -0.79
1B Shea Hillenbrand -1.46 2B Carlos Febles 0.04 3B Garrett Atkins -0.73 SS Carlos Guillen -1.33
1B Richie Sexson -1.50 2B Nicholas Green -0.08 3B Edgardo Alfonzo -0.74 SS Neifi Perez -1.55
1B Jeff Conine -1.55 2B Marcos Scutaro -0.87 3B David Wright -0.79 SS Jimmy Rollins -1.97
1B Daryle Ward -1.62 2B Scott Hairston -1.36 3B Wes Helms -0.91 SS Tony Womack -2.06
1B HeeSeop Choi -1.72 2B Abraham Nunez -1.49 3B Russell Branyan -0.94 SS Alex Cintron -2.16
1B Lyle Overbay -1.90 2B Mark Bellhorn -1.71 3B Aramis Ramirez -1.09 SS Omar Vizquel -2.19
1B Rafael Palmeiro -1.91 2B Willie Harris -1.72 3B Joe Randa -1.19 SS Jose Reyes -2.44
1B Wil Cordero -2.05 2B Juan Uribe -1.93 3B Melvin Mora -1.46 SS Khalil Greene -3.00
1B Tony Clark -2.20 2B Damian Jackson -2.09 3B Casey Blake -1.72 SS Jose Vizcaino -3.24
1B Jason Giambi -2.28 2B Luis Gonzalez -2.23 3B Alex Gonzalez -3.01 SS Rey Sanchez -3.40
1B Carlos Delgado -2.77 2B Alex Cora -2.28 3B Eric Hinske -3.14 SS Cristian Guzman -4.08
1B Randall Simon -2.83 2B Luis Castillo -2.42 3B Tony Batista -3.16 SS Chris Woodward -4.24
1B Lee Stevens -2.83 2B Mark Loretta -2.42 3B Troy Glaus -3.25 SS Jose Hernandez -4.24
1B Ryan Klesko -2.86 2B Jose Vidro -2.43 3B Ty Wigginton -3.33 SS Alex Gonzalez -4.40
1B Eric Hinske -3.01 2B Frank Menechino -2.60 3B Mark Teahen -4.18 SS Andy Fox -4.41
1B Matt Stairs -3.47 2B Ray Durham -2.69 3B Todd Zeile -4.28 SS Angel Berroa -5.58
1B Steve Cox -4.10 2B Tony Graffanino -2.81 3B Phil Nevin -4.59 SS Ramon Santiago -5.94
1B Mo Vaughn -4.91 2B Jeff Kent -2.89 3B Eric Munson -4.70 SS Felipe Lopez -6.07
1B Fred McGriff -5.05 2B Omar Infante -2.98 3B Mike Lowell -4.73 SS Ramon Vazquez -7.01
2B Damion Easley -3.10 3B Chris Truby -5.00 SS Jose Lopez -7.14
2B Todd Walker -3.30 3B Fernando Tatis -6.47 SS Rich Aurilia -7.20
2B Craig Biggio -3.39 3B Michael Cuddyer -7.22 SS Deivi Cruz -8.12
2B Brent Abernathy -3.60 3B Travis Fryman -9.30 SS Russ Adams -8.26
2B Bobby Hill -3.64 SS Jhonny Peralta -10.47
2B Jorge Cantu -3.71 SS Michael Young -13.29
2B Alfonso Soriano -4.21 SS Derek Jeter -13.81
2B Desi Relaford -4.69
2B Eric Young -4.83
2B Jose Castillo -5.28
2B Keith Lockhart -6.33
2B Ramon Vazquez -6.55
2B Bret Boone -7.62
2B Ruben Gotay -7.64
2B Ricky Gutierrez -8.20
2B Luis Rivas -8.24
2B Robinson Cano -8.39
2B Miguel Cairo -8.60
2B Roberto Alomar -9.68
2B Enrique Wilson -13.13


Results for Outfielders

Below, we give the SAFE values for each outfielder, averaged over the 2002-2005 seasons. Positive values indicate runs saved whereas negative values indicated runs cost. Within each position, fielders are ranked from best to worst. Only fielders for which we have enough data (at least 600 BIP faced) are included. These averages are weighted by the number of BIP faced by the player in each year, but keep in mind that some of the values below may be based on only one or two years worth of data. As mentioned above, the full year-by-year data is available here

Left Fielder
Center Fielder
Right Fielder
LF Coco Crisp 9.22 CF Jason Michaels 12.41 RF Gary Matthews Jr. 6.78
LF Reed Johnson 7.59 CF Andruw Jones 8.84 RF Trot Nixon 5.56
LF Carl Crawford 6.95 CF Darin Erstad 7.59 RF Dustan Mohr 4.19
LF Scott Podsednik 6.35 CF Aaron Rowand 5.58 RF Xavier Nady 4.11
LF Melvin Mora 6.19 CF Exavier Logan 5.18 RF Richard Hidalgo 3.84
LF Jayson Werth 5.30 CF Jeremy Reed 5.12 RF Jose Guillen 3.30
LF Pat Burrell 2.79 CF Grady Sizemore 4.81 RF Geoff Jenkins 3.24
LF Craig Monroe 2.32 CF Mike Cameron 4.76 RF J.D. Drew 3.11
LF Brian Jordan 1.99 CF Gary MatthewsJr. 4.67 RF Nicholas Swisher 3.10
LF Barry Bonds 1.90 CF Jim Edmonds 4.24 RF Casey Blake 2.84
LF Randy Winn 1.57 CF Willy Taveras 3.85 RF Jose Cruz 2.58
LF Brad Wilkerson 1.55 CF Wily Pena 3.75 RF Brady Clark 2.45
LF Garret Anderson 1.38 CF Doug Glanville 3.50 RF Gabe Kapler 2.42
LF Eric Byrnes 1.12 CF Torii Hunter 3.36 RF Jeromy Burnitz 2.24
LF Shannon Stewart 1.00 CF Corey Patterson 2.98 RF Jason Lane 1.98
LF Matthew Holliday 1.00 CF Brady Clark 2.95 RF Ichiro Suzuki 1.76
LF Luis Gonzalez 0.94 CF Marlon Byrd 2.41 RF Jody Gerut 1.76
LF Jacque Jones 0.68 CF Jay Payton 2.02 RF Matt Lawton 1.66
LF Chipper Jones 0.66 CF Juan Pierre 1.52 RF Alexis Rios 1.35
LF Albert Pujols 0.41 CF Timo Perez 1.41 RF Austin Kearns 1.07
LF Terrence Long 0.34 CF Mark Kotsay 1.06 RF Shawn Green 0.88
LF Geoff Jenkins 0.27 CF Chone Figgins 0.52 RF Vladimir Guerrero 0.54
LF Moises Alou 0.15 CF Covelli Crisp 0.49 RF Brian Giles 0.40
LF Jason Bay -0.14 CF Kenny Lofton 0.42 RF Bobby Kielty 0.21
LF Kevin Millar -0.49 CF Dave Roberts 0.37 RF Bobby Higginson 0.07
LF Jose Guillen -0.75 CF Scott Podsednik -0.16 RF Reggie Sanders 0.01
LF Adam Dunn -0.84 CF Tike Redman -0.19 RF Karim Garcia -0.06
LF David Dellucci -1.00 CF Luis Matos -0.20 RF Jay Gibbons -0.17
LF Jay Payton -1.39 CF Lance Berkman -0.45 RF Bobby Abreu -0.20
LF Frank Catalanotto -1.40 CF Lew Ford -0.59 RF Sammy Sosa -0.31
LF Kevin Mench -1.41 CF Chris Singleton -0.75 RF Juan Encarnacion -0.45
LF Jeff Conine -1.64 CF Carlos Beltran -0.87 RF Emil Brown -0.65
LF Rondell White -1.70 CF Milton Bradley -1.04 RF Jacque Jones -0.79
LF Carlos Lee -1.72 CF Rocco Baldelli -1.24 RF Reed Johnson -0.86
LF Ray Lankford -2.06 CF Carl Everett -1.58 RF Roger Cedeno -1.06
LF Lance Berkman -2.21 CF Vernon Wells -1.63 RF Bradley Hawpe -1.15
LF Raul Ibanez -2.55 CF David DeJesus -1.82 RF Juan Rivera -1.56
LF Larry Bigbie -2.58 CF Endy Chavez -1.90 RF Danny Bautista -1.67
LF Ryan Klesko -2.94 CF Wendell Magee -2.05 RF Aaron Guiel -1.73
LF Daryle Ward -2.95 CF Alex Sanchez -2.80 RF Matt Stairs -2.08
LF Cliff Floyd -3.15 CF Laynce Nix -2.85 RF Raul Mondesi -2.13
LF Brian Giles -3.28 CF Preston Wilson -3.09 RF Robert Fick -2.36
LF Todd Hollandsworth -3.53 CF Tsuyoshi Shinjo -3.34 RF Kevin Mench -2.37
LF Matt Lawton -4.39 CF Craig Biggio -3.41 RF John VanderWal -2.58
LF Roger Cedeno -4.66 CF Johnny Damon -3.45 RF Aubrey Huff -2.61
LF Hideki Matsui -5.49 CF Steve Finley -3.63 RF Magglio Ordonez -2.79
LF Reggie Sanders -5.55 CF Terrence Long -3.70 RF Larry Walker -2.87
LF Miguel Cabrera -6.10 CF Randy Winn -3.81 RF Tim Salmon -3.13
LF Manny Ramirez -6.55 CF Brad Wilkerson -4.54 RF Michael Tucker -3.31
LF Bobby Higginson -7.95 CF Marquis Grissom -5.09 RF Craig Monroe -3.41
CF Garret Anderson -7.06 RF Miguel Cabrera -4.01
CF Luis Terrero -7.58 RF Jermaine Dye -4.24
CF Ken GriffeyJr. -8.07 RF Ben Grieve -4.30
CF Bernie Williams -8.76 RF Bubba Trammell -4.31
RF Craig Wilson -4.77
RF Gary Sheffield -4.92
RF Juan Gonzalez -5.25
RF Wily Pena -6.77