Predictions: The Peril of the Early-Season Off-Week
I can post only when I have something to contribute. The professionals who cover NASCAR don’t have the luxury of waiting until there’s news. The content monster remains perpetually hungry. That need to fill space is probably why, every year about this time, we see predictions that try to draw conclusions about the entire season based on the first few races.
This is almost never a good idea. You wouldn’t, based on the first four races, predict that Austin Dillon would win nine races this years and Kevin Harvick the other 27 just because Dillon won one race and Harvick three, would you?
Remember back in 2012, when everyone was panicking because cautions were way down? I wrote an post then (Why You Can’t Predict Anything Based on the First 10 Races) warning that the it was a mistake to make predictions based on 10 races, but particularly on the first 10 races. But here we are doing it again.
Will 2018 Set a Record for Least Cautions?
I have no idea.
Neither does anyone else.
Worries about an anomalously low caution rate are being sounded again this year, spurred in part by a Martinsville race that was called by Kyle Petty as
“the most non-aggressive Martinsville race I’ve probably ever seen in my life… And I’ve been going to Martinsville for a long, long time.”
Let’s look at the evidence.
- NBCSports points out that, there have been an average of 5.3 cautions per race, the lowest since 1999
- There were 32 cautions this season, which is the same number as in 1999, while there were 66 cautions in 2008.
Here’s the graphic NBC Sports showed.
You know me: I don’t take anyone’s word for anything. I made my own plot.
The first thing you’ll see is that the TV people omitted some data. When you’re only showing a graph for 15 seconds, you can’t show anything very complicated because there’s just too much for people to take in.
The problem is that omitting that data also omits some important information. The problem is made worse by drawing lines between data points, because that suggests that the data behave the way the line indicates. Now that you’ve seen the entire data set, you know that the data are not as well behaved as the graph would have you think.
- This graph makes it look like the cautions went up for awhile and then went down starting in 2008.
- While there is an overall trend toward lower numbers of cautions since 2008, it’s not monotonic (which means it gets smaller each time.)
- There were the same number of cautions in 2012 as there were in 2008.
- The tracks being run and the lengths of the races changed from year to year. If you run 400 miles at one track one year and 500 miles the next year, you’re going to have more cautions in the longer race. These numbers don’t take those factors into consideration.
- At the end of 2017, people were wailing about cautions having gone way up. That larger number of cautions in 2017 is omitted from the graph.
Comparing Apples and Oranges — Or Giraffes
I’ve made the argument before that you cannot look at cautions in terms of something as simple as absolute numbers of cautions and suggested that the appropriate measure is cautions per 100 miles of racing. Here’s the graph for the first six races, followed by my justification of this metric.
The Number of Races Run per Season Varies
We’ve only been running 36 races per season since 2001, so comparing absolute numbers only works from 2001 on. I realize that doesn’t make a difference in graphs of the first six races, but you’re predicting the entire season and it does make a difference if you run 34 vs. 36 races.
The Lengths of the Races and the Tracks Run Varies
- Even when there are the same number of races, the orders of the tracks changes
- Some races are rained out; others go into overtime
- The lengths of races have changed over the years
Compare the cautions per 100 miles graph and the absolute cautions graph and you’ll notice some differences. Look at 2010 and 2011, for example. The points change because the Fontana race changed from 500 to 400 miles and Atlanta was replaced by Phoenix. We’d run 300 miles less in 2011 after six races than we had in 2010. The table below lists the first six races those years.
We’ve run the same tracks with the same race lengths since 2015. Before that, Bristol was one of the first six races and you can imagine that’s going to change the information for the first six races. So here’s the Cautions per 100 miles after six races.
What About Stage Racing?
As we discussed in another blog, the advent of stage racing has changed things. We have fewer debris cautions and more stage cautions. I’m ignoring that here, just because it’s hard to control for that.
But 2018 Does Have Record Low Cautions, so Far, Right?
Technically, no. If you correct for some of the above considerations, 2018 has 1.34 cautions per 100 miles while 1998 has 1.27 cautions per 100 miles. The next lowest is 2016 with 1.42 cautions per 100 miles.
But the number of cautions is definitely lower than it was last year and slightly lower than in 2016, so let’s look at that. When did that happen? Let’s start with looking at the cautions after one race.
That’s a very different graph, isn’t it? 2018 is nowhere near the lowest. Let’s look at some other stopping points.
So 2018 wasn’t anomalously low after 3 races or 5 races. In fact, 2018 only became the “lowest-caution season” after the sixth race. What we’re seeing isn’t indicative of the 2018 season, it’s indicative of an extraordinary Martinsville race.
Blame 2018 on Martinsville
Here’s a graph of cautions for Martinsville since 1980. I separated out the Spring and Fall races, mostly because it made a very pretty graph.
We can make a histogram of cautions to see how they behave.
You’ll notice from the top graph that there have been more cautions in the last few years, so I made the same graphs for just 2001-2018.
Looking just from 1980-2018
- Martinsville has an average
- 10.76 cautions for the Spring race
- 12.50 cautions for the Fall race
- The races with the lowest numbers of cautions are 4 (Spring 2018) and 5 (Fall 2016)
- The races with the highest numbers of cautions are 18 (Spring 2008) and 21 (Fall 2007)
- Martinsville has a really high standard deviation: 3.45 (Spring) and 3.83 (Fall).
The standard deviation gives you a range of the most likely (most likely being 68% of the time) number of cautions.
For the Spring race
- About 68% of the races should have between 7 and 14 cautions.
- About 95% of the races will be between 4 to 18 cautions.
I should note that the number of cautions has been higher in the later years. If you just look at 2001-2018, the average s 12.33 – but it’s nowhere near a normal distribution, so I didn’t want to try to apply standard deviations to those numbers.
People came up with all kinds of explanations for what was going on:
- Drivers are better
- It’s all due to stage cautions, which have made debris cautions go down.
- Double-file restarts made the cautions go up
- Cars are easier to drive because of downforce changes
- Drivers have realized how important it is for the championship that they not crash out early
- Drivers have just gotten less aggressive
- Better tires, better engines, better brakes mean less parts failures
These things may be true; however, the reason 2018 has an abnormally low number of cautions is because the last Martinsville race was on the lowest edge of likely. If we’d had the average 12 cautions, there would be 40 cautions total, which is still on the low side, but is higher than it was in 2016. 2018 wasn’t anomalous after three or five races. The Martinsville race pushed it over.
So This is Just Because of Martinsville?
No. As I’ve pointed out before, the caution number change significantly over the first 10-15 races. Let’s look at the cautions as a function of race number. Here’s the numbers for 2015.
You see how the numbers change rapidly over the first 10 race, then settle in to what ends up being the final average around race 20 or so. Making any kind of prediction before that is a losing proposition.
The cautions don’t always go up, either. Here’s the 2011 data
So lets compare the first six races to the final values.
There’s some similarity between the two graphs, but you’ll notice that , for example, 2004 didn’t end up being as low as it was after 6 races; 2012 was high after 6 races, but ended up being very low.
The moral: Don’t try to extrapolate a whole season based on one extreme. Remember when we talked about the average age of the Daytona field and found that having one 66-year old in the field of 40 drivers increased the average age of the field by one whole year? It’s the exact same thing here. We’ve got one really anomalous race and it skewed the numbers.
Let’s see what the numbers look like after Bristol. All we need is one crash fest and we’ll be right back to normal.
This Stuff Actually Matters
Yes, it’s a minor irritation when NASCAR stats are misrepresented; however, people do this in all kinds of areas from economics to politics. Knowledge really is power. Don’t be swayed just because someone is throwing numbers around. Make sure those numbers mean something.