Statistics for all: Prediction
Measurements are often used to make predictions. For example, they can help predict how tall a 4-year-old is likely to be in adulthood, which students are likely to do better in an academic program, or which candidates are most likely to succeed in a particular job.
Some of the attributes we measure are strong predictors, others are weaker. For example, a child’s height at age 4 is a pretty strong predictor of adult height. Parental height is a weaker predictor. The complexity of a person’s workplace decision making, on its own, is a moderate predictor of success in the workplace. But the relation between the complexly of their workplace decision making and the complexity of their role is a strong predictor.
How do we determine the strength of a predictor? In statistics, the strength of predictions is represented by an effect size. Most effect size indicators are expressed as decimals and range from .00 –1.00, with 1.00 representing 100% accuracy. The effect size indicator you’ll see most often is r-square. If you’ve ever been forced to take a statistics course—;)—you may remember that r represents the strength of a correlation.
Before I explain r-square, let’s look at some correlation data. The four figures below represent 4 different correlations, from weakest (.30) to strongest (.90). Let’s say the vertical axis (40 –140) represents the level of success in college, and the horizontal axis (50 –150) represents scores on one of 4 college entrance exams. The dots represent students. If you were trying to predict success in college, you would be wise to choose the college entrance exam that delivered an r of .90.
Why is an r of .90 preferable? Well, take a look at the next set of figures. I’ve drawn lines through the clouds of dots (students) to show regression lines. These lines represent the prediction we would make about how successful a student will be, given a particular score. It’s clear that in the case of the first figure (r =.30), this prediction is likely to be pretty inaccurate. Many students perform better or worse than predicted by the regression line. But as the correlations increase in size, prediction improves. In the case of the fourth figure (r =.90), the prediction is most accurate.
What does a .90 correlation mean in practical terms? That’s where r-square comes in. If we multiply .90 by .90 (calculate the square), we get an r-square of .81. Statisticians would say that the predictor (test score), explains 81% of the variance in college success. The 19% of the variance that’s not explained (1.00 -.81 =.19) represents the percent of the variance that is due to error (unexplained variance). The square root of 19% is the amount of error (.44).
Correlations of .90 are very rare in the social sciences—but even correlations this strong are associated with a significant amount of error. It’s important to keep error in mind when we use tests to make big decisions—like who gets hired or who gets to go to college. When we use tests to make decisions like these, the business or school is likely to benefit—slightly better prediction can result in much better returns. But it’s not as clear that individuals will benefit. Because of error, there are always rejected individuals who would have performed well if they had been selected, and there are always accepted individuals who perform badly even though they were accepted.
As I mentioned earlier, correlations of .90 are very rare. So let’s get realistic. In recruitment contexts, the most predictive assessments (shown above) correlate with hire success in the range of .65, predicting 42% of the variance in hire success. That leaves 58% of the variance unexplained. The best hiring processes not only use the most predictive assessments, but also try to account for some of the left-over unexplained variance by considering additional predictive criteria.
On the low end of the spectrum, there are several common forms of assessment that explain less than 9% of the variance in recruitment success. Their correlations with recruitment success are lower than .30. Yet some of these, like person-job-fit (not to be confused with Precise Role Fit), reference checks, and EQ ability tests, are wildly popular. In the context of hiring, the size of the variance explained by error in these cases (more than 91%) means there is a very big risk of being unfair to a large percentage of candidates.
If you’ve read my earlier article about replication, you know that the power-posing research could not be replicated. You also might be interested to learn that the correlations reported in the original research were also lower than .30. If power-posing had turned out to be a proven predictor of presentation quality, the question I’d be asking myself is, “How much effort am I willing to put into power-posing when the variance explained is lower than 9%?”
If we were talking about something other than power-posing, like reducing even a small risk that my child would die of a contagious disease, I probably wouldn’t hesitate to make a big effort.
Summing up (for now)
A basic understanding of prediction is worth cultivating. And it’s pretty simple. You don’t even have to do any fancy calculations. Most importantly, it can save you time and tons of wasted effort by providing a quick way to estimate the likelihood that an activity is worth doing (or product is worth having). Heck, it can even increase fairness. What’s not to like?
My organization, Lectica, Inc., is a 501(c)3 nonprofit corporation. Part of our mission is to share what we learn with the world. One of the things we’ve learned is that many assessment buyers don’t seem to know enough about statistics to make the best choices. The Statistics for all series is designed to provide assessment buyers with the knowledge they need most to become better assessment shoppers.