Correlations

In many cases, we’re interested whether one variable is correlated with another: for example, do drivers who finish well in the Daytona 500 go on to have a good year? In other words, could you predict whether a driver would have a good season based on his or her finish at Daytona?

I’m explaining correlations in terms of this question, but the general ideas are the same.

Types of Correlations

If Daytona were perfectly predictive, then the driver who finished first at Daytona would win the championship, the driver who finished second would finish the season in second place, and so on. A plot of the driver’s season-ending rank vs. the race finish would look like this for a perfect correlation.

An example of perfect correlation

If we actually saw this in the real data, the only interpretation would be that NASCAR is fixed. (And we know it’s not, so we expect so scatter.) If there were a strong correlation, we’d more likely see something like this:

An example of strong, but not perfect correlation

There’s a clear overall trend, even though there’s some scatter in the data. The more scatter, the less correlation and the less predictive ability.

One more practice run. Here’s the same data as above, but I’ve introduced some luck — good and bad. In this simulation, some of the top season finishers wrecked out of the race and a few of the backmarkers got lucky.

An example of strong correlation with a few outliers thrown in.

The overall strong trend is still there, but a couple data points (the ones highlighted) are way off the line. The data points below and to the right of the line are drivers who perform worse than their season-ending rank would predict. For example, the season champion gets caught up in an accident early and finishes dead last. The data points to the left and above the line are drivers who are over performing.

An then, if you get a graph that’s pretty much random, like this:

This tells you that there’s absolutely no correlation!

And remember that just because two things are correlated doesn’t mean one causes the other.

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.