This article is part of our The Z Files series.
Recency bias and fear of missing out are two strong forces, often unduly influencing rankings and player selection. One of the prime examples is playoff performance. Every season, there are a handful of players elevated or penalized based on postseason numbers, prompting me to tweet the following poll:
From a purely analytical perspective, the answer should be yes or no. What's good for the goose is good for the gander. However, as noted, that's not the way the human psyche works.
A plausible argument can be tendered for and against inclusion. Before discussing each, I wonder if the results would have been different if the question was posed following a full 162-game season? It's possible the shortened 2020 campaign influenced the responses, as people feel the playoffs added more data to an unusually limited sample.
My suspicion is if the poll were presented after a normal season and not while a playoff hero was rounding the bases after slugging his umpteenth October homer, "No" would have dominated. There are two chief reasons for this. First, the sample would have been cited as being too small. The player happened to get hot or cold in the playoffs and the trend would have flipped at some point during the regular season. Second, it isn't fair some players had the chance to generate more data while others sat at home.
Recency bias and fear of missing out are two strong forces, often unduly influencing rankings and player selection. One of the prime examples is playoff performance. Every season, there are a handful of players elevated or penalized based on postseason numbers, prompting me to tweet the following poll:
From a purely analytical perspective, the answer should be yes or no. What's good for the goose is good for the gander. However, as noted, that's not the way the human psyche works.
A plausible argument can be tendered for and against inclusion. Before discussing each, I wonder if the results would have been different if the question was posed following a full 162-game season? It's possible the shortened 2020 campaign influenced the responses, as people feel the playoffs added more data to an unusually limited sample.
My suspicion is if the poll were presented after a normal season and not while a playoff hero was rounding the bases after slugging his umpteenth October homer, "No" would have dominated. There are two chief reasons for this. First, the sample would have been cited as being too small. The player happened to get hot or cold in the playoffs and the trend would have flipped at some point during the regular season. Second, it isn't fair some players had the chance to generate more data while others sat at home. To be honest, this would have been my initial reaction. No, playoff stats shouldn't count.
However, upon reflection, I'm not so sure. The sample size aspect can be dealt with by focusing on skills as opposed to outcomes. Most hot streaks are embellished with good luck, which can be resolved via proper statistical neutralization. Similarly, cold streaks are usually compounded by misfortune, which again can be accounted for by evaluating skills and not outcomes.
As for equity of playing time, some players are afforded more opportunity to generate data. So what? Everyone is judged on varying levels of playing time in-season; postseason stats are just an extension of this.
A major issue with considering postseason stats in projections is that they're derived under different conditions than the regular season. Small samples are always subject to bias, with the expectation they get balanced over the course of 162 games. At least anecdotally, batters face superior pitching in the playoffs, and vice versa. This season, the effect was softened with fewer off days, so teams were forced to use more than three starting pitchers.
The above is not a deal breaker. Aggregate regular-season hitting and pitching can be compared to the corresponding aggregate playoff stats and normalized. In fact, a similar normalization is necessary to account for the geographical nature of the 60-game season.
Something else that requires adjustment for the 2020 playoffs is ballpark neutralization. Once teams advanced past the wild-card round, they were subject to a bubble with each series played in the same ballpark. On paper, this isn't a problem since the factors for the respective venues can be applied in the neutralization process. However, 2020 poses an issue since the indices for Globe Life Field are essentially useless. Not only is a two-month sample insufficient to generate trustworthy data, the Rangers' new venue is essentially two yards, depending on whether the roof is open or closed.
The bottom line is most of the reasons for "No" can be cleverly dealt with via numerical adjustments. The translations are not perfect, but instead of comparing apples to oranges, it's Macintosh to Honeycrisp.
Even so, I'm still hesitant to roll 2020 playoff numbers into 2021 player projections on a global basis. Admittedly, going on an individual basis flies in the face of projection theory, but I see it as just another oddity to the 2020 campaign. As has been discussed in this space for the past month, more subjectivity than usual needs to be invoked when evaluating what happened during the past season. Counting or not counting playoff performance is just another factor to be considered.
The good thing is this doesn't have be a mental exercise; historical data can be investigated to determine whether including playoff performance improves the efficacy of the ensuing year's player expectation. Looks like I have another project to add to my "Todd-do" folder.
To show how much of a difference this can make, let's tag in the impetus of this discussion, Randy Arozarena, and generate his projection based on various inputs. Please note this is my process and not that utilized in RotoWire's official site projections.
In brief, I use a weighted average of three years to generate a baseline. Due to the abbreviated 2020 season, I'm including 2017 but reducing its weight. Players appearing in Double-A and Triple-A between 2017 and 2019 have their stats translated via MLE (Major League Equivalency). All numbers are park corrected with age correction and regression towards expected statistics. To keep it simple, all of Arozarena's projections will be presented assuming 600 plate appearances.
Making no adjustments, this is how my little black box envisions Arozarena's 2021 season:
AB | H | HR | R | RBI | SB | AVG | Rank |
---|---|---|---|---|---|---|---|
525 | 137 | 20 | 83 | 60 | 23 | 0.260 | 101 |
Putting things in perspective, Arozarena's ADP in the early National Fantasy Baseball Championships' Draft Champions format is around 61. Based on my early rankings and the above projection, he's 101st.
Now let's float in his 2020 postseason numbers, adjusting for venue but not quality of competition.
AB | H | HR | R | RBI | SB | AVG | Rank |
---|---|---|---|---|---|---|---|
526 | 144 | 24 | 88 | 64 | 24 | 0.274 | 46 |
As expected, there's marked improvement, especially in terms of power. That's what slamming 10 homers in 20 games will do. As mentioned, I'm not inclined, at least for how, to include playoff numbers, so I'll back them out.
While this is admittedly subjective, on several occasions, I'd eliminate minor-league stats for players who appeared to have improved since that season. Yes, it's cherry-picking, but if skills growth suggests it's warranted, I feel it's justified.
In 2017, Arozarena slashed .252/.366/.380 with Double-A Springfield. He was 22 years old at the time and still in the Cardinals organization. Even though this season is already lessened in the weighted average, I removed it from the baseline determination, yielding:
AB | H | HR | R | RBI | SB | AVG | Rank |
---|---|---|---|---|---|---|---|
525 | 139 | 21 | 82 | 64 | 23 | 0.265 | 87 |
As you can see, this is a modest improvement over the initial projection, but it still didn't sit right with me. In 2018, still with St. Louis, Arozarena slashed .232/.328/.348 for Triple-A Memphis. Overlooking this results in the following:
AB | H | HR | R | RBI | SB | AVG | Rank |
---|---|---|---|---|---|---|---|
525 | 152 | 25 | 89 | 70 | 25 | 0.290 | 24 |
The thing is, "Because he sucked," isn't adequate cause for omitting the season, so I kept it in. If I'm going to make a change, there needs to be a reason, even if it's more art than science.
That said, perhaps influenced by Arozarena's playoff heater, it can be argued some of his batted ball skills improved markedly over the summer. Specifically, his average launch angle increased as well as his average exit velocity on fly balls. Both traits are obviously conducive to an increase in power.
On the other hand, Arozarena fanned more than usual. However, if more weight is given to the 2020 batted ball skills, the poorer plate skills come along for the ride and factor into the result. Here's a projection shifting the weighted average away from 2019, giving a little more credence to 2020 while leaving the MLE from 2018 alone:
AB | H | HR | R | RBI | SB | AVG | Rank |
---|---|---|---|---|---|---|---|
524 | 137 | 24 | 86 | 66 | 23 | 0.261 | 69 |
This is where I currently stand. Let's put the four projections on the same chart to facilitate analysis.
Projection | AB | H | HR | R | RBI | SB | AVG | Rank |
---|---|---|---|---|---|---|---|---|
Initial | 525 | 137 | 20 | 83 | 60 | 23 | 0.260 | 101 |
With playoff stats | 526 | 144 | 24 | 88 | 64 | 24 | 0.274 | 46 |
Without 2017 MLE | 525 | 139 | 21 | 82 | 64 | 23 | 0.265 | 87 |
Without 2017 and 2018 MLE | 525 | 152 | 25 | 89 | 70 | 25 | 0.290 | 24 |
Altering 2020 and 2019 weights without 2017 MLE | 524 | 137 | 24 | 86 | 66 | 23 | 0.261 | 69 |
In full disclosure, the projection is a bit different from my initial drop on November 1. It was also derived without knowing the results of the initial NFBC drafts, which were made publicly available after the final adjustment.
The fifth iteration carries out the intent of the adjustment as homers increase while batting average drops, as it should with more strikeouts. It's interesting how close it is to the unabridged version. It's as if I took the initial projection and said, "I think Arozarena hits for more power than this."
It's fair to question if I had a target in mind and all the manipulations were an artificial means of landing at the desired point. Honestly, I'm not sure I can categorically deny it. I think there's more to it than deciding on a final projection and fudging the inputs to get there. However, the ultimate manipulations can all be supported, even if with narrative. Still, they were accepted because projections passed the sniff test.
That said, is there anything wrong with that? Which do you prefer, projections/rankings done blindly, without interjecting some personal seasoning, or a formulaic basis with logical and defensible alterations?