Supermarginal: The Advantage of the “Dumb” Average

Projections, Supermarginal

Supermarginal will be a weekly column that will shine a rogue economics spotlight on fantasy sports.

By Dan Gardner

When I was in high school, my interest in fantasy baseball was more cursory than it is now. Rather than pore through the data on Fangraphs.com, I typically gleaned who was doing well by perusing the daily leaderboard in the sports section of the newspaper.

One could always get a good idea of the leading MLB HR totals in a given year by simply glancing at the end-of-season totals from the previous season.

These days, popular projection systems like Marcel the Monkey have adopted a different system. Those familiar with these projections will be aware they take player outputs in certain categories over a three-year time period to forecast future performance. This is known as an autoregression.

Progression systems will also apply a regression to the mean operation on each hitter’s data. Baseball research has demonstrated that we need to watch how a player performs over several years before we can be confident in his abilities. When we do not have that much data on a hitter, the responsible thing to do is to project him as a league-average hitter going forward, with a thumb on the scale for quantity and quality of past performance.

CHONE's '10 Leaderboard: Howard (44), Fielder (41), Pujols (40), Teixeira (34), A-Rod (34)

CHONE's '10 Leaderboard: Howard (44), Fielder (41), Pujols (40), Teixeira (34), A-Rod (34)

One result of this practice is that the projected leaders in specific categories will always have less impressive statistics than what the leaders from previous years put up. For example, even though the #1 home run hitter for each of the past five years posted an average of 51.6 homers, Sean Smith’s CHONE system projects only 44 homers for its 2010 leader —Ryan Howard.

However, if you believe that Ryan Howard will lead the league in homers in 2010, there’s a strong case to be made that he will hit 52 homers instead of 44 homers.

Marcel and CHONE use an individual projection method, but there’s another approach that resembles the one that many of us used to use when reading the sports section — the historical average method.

Let’s compare the 2009 leaderboard totals (top 50) in the standard five hitting categories to two projected leaderboards. The first projection will use CHONE, and the second a simple average of the top totals from 2006-2008 (which I’ll call 3YR). Which did better at projecting leaderboard values in 2009? I measured four values for each category: the mean distance from the actual result and the mean absolute distance from the actual result  for both CHONE and 3YR. Here are the results:

picture-3

BIG DISCLAIMER before we begin the analysis: CHONE does not claim to project accurate leaderboards, so in a sense we’re comparing apples and oranges here.

Nonetheless, if you are basing your projected team totals using CHONE or another similar projection system, it may surprise you to know how skewed the final results will be from the initial projections.

In four out of five cases, 3YR projects leaderboards with far less systemic bias than CHONE. The benefits of the historical average method were particularly salient in forecasting AVG and HR leaders. It’s scary to look at how accurately you can calculate the Nth highest HR total just from looking at what that total has been in the previous few years.

In terms of fantasy, the weakness of using CHONE to make player projections becomes most noticeable in the SB category. The SB leaderboard distribution as projected by CHONE is extremely flat compared to what actually occurs. CHONE underestimated the totals of the top 3 SB leaders by an average of 14 steals. When a huge portion of the production in a category comes from the top handful of guys, it seems likely that a regression-oriented system will systematically undervalue those players. (Slight digression: How, then, does CHONE overestimate the top SB totals by an average of nearly 5 SB per rank? The inclusion of AAA speedsters who have no business getting major MLB playing time has a lot to do with that.)

Why does this matter?

If you believe Ryan Howard is the premiere HR hitter in MLB, and you’re trying to figure out how to balance your team properly, you want to know if he’s going to hit 52 HR or 44 HR. CHONE and other systems have a difficult time acknowledging this “macro”-level reality.

What’s an owner to do? picture-21

Perhaps people should use the historical average method in combination with their subjective rankings on players. One could take the historical average fantasy values for each position and then simply figure rankings at each position in order to create a draft day cheat sheet. This method was espoused in one of the great books about fantasy football, Drafting to Win by Robert Zarzycki. While the increased number of positions in fantasy baseball relative to fantasy football makes this method more complicated, the idea of forming player preferences as opposed to full player projections offers several potential advantages.

As humans, we have a very difficult time with numbers. Without much thought, I can tell you that I prefer eating pizza to eating cauliflower. But how much pleasure do I derive from eating each? I have no idea. So focusing our draft preparation on rankings instead of projections would be easier and probably more fun (unless you have a naughty numbers fetish).

In a similar vein, this method has the potential to be a huge time-saver for fantasy owners looking for a scalable method to apply to drafting in several different leagues. By using league rules to calculate historical fantasy values and then intuitively slotting players into those values, we can quickly place a reasonable value on the expected production from the 6th best 3B in any league. This offers a consistent and time-efficient alternative to making haphazard upward or downward adjustments to a player’s value as you move from one draft to the next.

Further, if we forego the process of taking a computer’s player projection and adjusting to suit our own views, we avoid the possible risk of systemic bias cropping up in our projections.

For example, unusual news is far more memorable to us than other types of news. In the context of fantasy baseball, consider a situation in which there are two kinds of hitters we can draft — good and mediocre — and two kinds of news we can receive about those players — positive and negative.

If mediocre players outnumber good players, and positive news stories outnumber negative news stories, then negative news about a good player will make a disproportionately large impression on our minds, and we will tend to underestimate that player going into the season (e.g. Chase Utley in 2009). While the historical average method does not completely alleviate this problem, it does minimize the influence that dramatic changes in valuation have on our projected statistics as a whole. If we reduce Utley’s projected value, we will simultaneously boost the values of other players. While this may be counterintuitive, league-wide totals are more consistent than individual player totals, and making sure they don’t get out of wack should be a priority.

Share/Save/Bookmark

2 Comments

2 Comments

  1. Millsy  •  Nov 19, 2009 @8:42 am

    Just a curiosity about the X-axis on the BA graph above. Are the 1-2-3-4-5 just bins of players that are averaged? So Bin 1 is the top X-number of players, Bin 2 is the next percentile bunch, etc.? Are the bins sized the same? Or is each 1-2-3-4-5 just the 1st Ranked, 2nd Ranked…player in the category for each season? Maybe I missed something obvious, but it’s not clear to me. This is a great point about the CHONE projections. I was checking them out deciding on Keepers this week and started getting depressed. I knew they undershot, but I didn’t realize it was so significant.

  2. Dan  •  Nov 19, 2009 @8:46 am

    It’s simply looking at 1st ranked, 2nd ranked, 3rd ranked, etc.

Leave a Reply

Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • About this blog

    Fantasy Ball Junkie is a blog for advanced fantasy baseball enthusiasts who want to get an edge on competition. The site focuses on strategy, player evaluation, transactional analysis, bargaining theory, and all the skills integral to having a successful season. I can be reached with tips, requests, or abuse at editor@fantasyballjunkie.com

    • Search

  • Categories

  • Archives