Why I Don't Do Projections: Chris Liss States His Case Against Projections

Written by

Updated on January 17, 2011 5:03PM EST

Virtually every fantasy baseball publication creates projected stats for the entire relevant player pool in the commonly used categories. RotoWire is no different, and while that thankless job falls on Jeff Erickson's plate, I've run them through our dollar value formula enough times to form a few ideas about the process. For starters, it's not obvious what we're asking Erickson to do as he creates the fictional 2011 seasons for everyone. Is he supposed to take three-year averages and simply adjust for park, health, playing time and age-related growth/decline?

Maybe that's a starting point, but do all players of the same ages grow/decline at the same rate? (Let's lump age and experience together for the sake of simplicity even though in reality, they're two separate factors). Probably not. You'd think a blue-chip prospect with great raw stuff like Brian Matusz would have more upside than a lesser prospect like Brett Cecil, even though they're the same age and their seasons last year (aside from the wins) were fairly similar. So in addition to the complex task of taking historical data on specific players and adjusting them for several hard-to-quantify variables, you must also consider how players of various types have grown or regressed historically and decide which player corresponds to which type.

But historical type is not easy to identify. Matusz's top-three current comps according to Baseball-Reference.com are Jonathon Niese, Wade LeBlanc and Bradley Bergesen. Cecil's comps are Tommy Hunter, Glen Perkins and Chuck James. You can see how this is not much help in projecting these guys for 2011. In the end, you're probably going to give Matusz more growth because he's more likely to be a staff ace based on pedigree (and his dominance down the stretch), but not be too generous because there's huge downside for every young pitcher, especially one in the AL East. How you strike this balance is largely subjective, but if you're conservative, i.e., if you want to account for both upside and downside without taking a position on which outcome will actually occur, you might give Matusz 12 wins, 160 Ks and an ERA around 3.85. If you do the same with Cecil, you'll probably give him 11 wins, 135 Ks and an ERA around 4.15.

The difference between the two won't be that great, and Matusz will likely cost a few dollars more when you run them through your dollar-value formula. But is this what we really want from a set of projections? Everyone getting their average season based on their histories, and adjusted slightly for this year's circumstances? Because you can see how it condenses the values with Cecil and Matusz as the downside is roughly the same - any young pitcher in the AL East can be a disaster, and Matusz's superior upside is weighed down by that. Multiply this across the player pool, and you have projections that (a) are timid (no one hits 50 homers or wins 20 games) and (b) don't look like the actual distribution of stats in a real season where plenty of players bomb completely or get hurt, and others have their 90th percentile seasons (or in Jose Bautista's case last year) arguably their 99.99th percentile ones.

The alternative is also bad. Instead of making projections with necessarily subjective growth inputs that create a fictional distribution of player stats that we know will not look anything like the actual one, we can make predictions - and actually give Matusz a Year 3 breakout - 14 wins, and 3.30 ERA, 195 Ks. We're going to predict Matusz will have a big year and project him accordingly. Those are some great projections - ones I can actually use at my draft! But the problem then is why is Matusz getting his 75th percentile projection, i.e., a far more favorable than average outcome for him, when Cecil gets his 50th percentile one? And once we start giving out 75s and 25s for various players rather than 50s across the board, aren't we just pretending we're psychic? Why are we taking a position on whether a player will exceed or not exceed his proper mean (50th percentile) projection? Isn't that like saying, "My pocket sevens are worth more here because I know a seven will fall on the flop?" Shouldn't we be agnostic as to the flop, and value our holdings accordingly?

Or if you really think Matusz will have the year I described as his 75th percentile one, you could argue that my numbers are wrong, and that's really his 50th percentile one. But if 195 Ks is his 50th percentile projection, and an arm injury is his zeroth percent one, then what does it mean for him to have a 75 percentile one? If he does better than his average in that case, he's going to be up for the Cy Young award. That sounds far fetched given his age, division and supporting cast, but it's hardly unprecedented. Look at the leap Tim Lincecum made at the same age from 2007 to 2008.

We have two major problems here: (1) Even if we knew for sure what Matusz' 50th percentile number was, we'd have no way of knowing what his actual season would be due to variance (BABIP, HR/FB, injury luck, etc.); and (2) even if we knew his actual 2011 numbers in advance, we would have only a rough idea what his 50th percentile season should have been because it's hard to say whether that's the season he should have been expected to have. Looking back on Bautista's 2010, what should his mean projection have been? Thirty-two homers? How could anyone in his right mind project the 29- year-old Bautista, given his track record and normal historical development of players, for that many homers? We can't make Bautista play a million seasons at age 29 and take the mean, so there's no way to know whether 2010 was a one out of a million one or just his 90th percentile one which, while still unlikely, was there if anyone had read the signs.

I suppose some of these problems could be solved if historical comps were so good that growth and regression at certain age and experience levels given different skill sets could be predicted precisely for each player, and you could know what his luck-neutral expected output was. But because every player (unlike cards in poker) is unique and facing unique circumstances, this kind of precision based on precedent seems unlikely. One could argue that players are not that unique in the relevant ways, i.e., they might all look different or have different personalities, but in terms of throwing motion, handedness, pitch repertoire, velocity, command, etc. they can be categorized closely enough to present reliable precedents for those of a similar type.

But even then, you'd run into the problem of categorizing players in the right group. Who knew three years ago Cliff Lee would be in the same group as Roy Halladay (assuming lefties and righties can be grouped together)? The bottom line - projections (with a good dollar-value formula) are useful to get a ballpark understanding of what certain stat lines are roughly worth, and how giving a player 30 rather than 35 homers changes his dollar value. But once you've done that a few times and internalized a sense of it, these fictional renderings (which are flawed even before we even address the huge problem of variance, i.e., dumb luck) should not be taken too seriously.

For one, the growth/regression element is necessarily subjective, and second, even players who are in a plateau phase of their careers, e.g., 30-year-old pitchers, have indeterminate baselines because it's impossible to separate skill from luck entirely. For example, if you think BABIP is pure luck and .300's the norm, then you'd probably take +105 on Aaron Harang (.318 BABIP over 1451.2 career IP) having a lower BABIP this year than Mariano Rivera (.273 BABIP over 1150 career IP). If so, please let me know.

If you concede BABIP is not always luck, then figuring out the line between luck and skill is much harder. And that's just one example. Projections and the dollar values they generate can be helpful if you're the kind of person that wants the extra discipline of knowing exactly how much you're willing to pay for a player in advance (though in that case, I'd suggest you plug them into RotoWire's Draft Software because you'll want dynamic rather than static dollar values as your auction goes on). But I personally prefer a fairly well ordered list so I can see who's gone and how deep each position goes, and I'll make my bids based on my research and my instincts at the draft. That's admittedly subjective, but no more so than the projector who's subjectivity occurs either in estimating player growth rate, categorizing a player as a certain type or simply adjusting for hard-to-quantify factors like health status, repertoire/stance changes or personal circumstances.