Most recruitment assessments are unfair to job candidates
Are your recruitment assessments fair to candidates? Check out this litmus test.
I studied psychometrics at UC Berkeley in the 1990s, taking several graduate-level courses in sophisticated methods like IRT (item response theory) and HRM (hierarchical linear modeling). The program at Berkeley was (and is) one of the best in the world.
In my work, I continue to adhere to the standards for statistical reliability taught in Berkeley’s program—especially when it comes to what test developers call high-stakes assessments. These are the assessments used to help with decisions like who to hire, promote, or admit to an educational program.
I have since learned that the reliability standards taught in UC Berkeley’s psychometrics program were much higher than current accepted standards for adult assessment. This puzzling, because lower standards mean scores are less reliable, and therefore, less precise. Let’s take a look at what this means in practical terms.
In psychometrics, reliability statistics tell us how precise our measurements are—in other words, how much noise there is around a given score. Whereas measurements we make in the physical world (like temperature readings) are incredibly precise, educational and psychological measurements are incredibly imprecise. You can think of temperature reading as a point on as scale. A score on a psychological or educational assessment is more of a range.
Take a look at the 3 thermometers shown here. All of them show a score/temperature of 55. Below each thermometer is an Alpha—the most commonly used indicator of reliability in the adult assessment literature. To the right of each thermometer I’ve added a curly bracket that shows the range in which the true score of an individual with a score of 55 is likely to fall. This range is called a confidence interval.
The next image, shows how many actual levels can be identified by assessments with Alphas of .75, .85, and .95.
If its Alpha is .75, an assessment can distinguish only two levels on the 100 point scale. If its Alpha is .85 it can distinguish about 3.5 levels, and If its Alpha is .95 it can distinguish about 6 levels. You can see why an Alpha of .95 or above is the gold standard for high-stakes educational and psychological assessment.
There are several assessments employed in recruitment—including some popular 360 and culture fit assessments—that have Alphas in the .75 range. Many personality inventories have alphas in the .85 range. Relatively few adult assessments used in recruitment have Alphas in the .95 range.
Imagine that you have taken an assessment as part of a recruitment process. Let’s say your score is 55.
Here are the ranges in which your true score is likely to fall.
If the Alpha for the assessment is .75, your true score is somewhere between 30 and 80. If the Alpha is .85, your true score falls between 40 and 70. If the Alpha is .95, your true score falls between 45 and 65.
This next image shows your ranges and the ranges for Tom. If an employer was going to choose between you and Tom, whose score is 75, which Alpha level would you find fair?
Hint: An assessment would need to have an Alpha of .95 to reliably distinguish between you and Tom.
You can think of Alpha as an indicator of measurement precision. The higher the Alpha, the more precise the measurement. The higher the stakes attached to a measurement, the higher Alpha should be.
Reliability is not the only thing that’s important when selecting an assessment. You’ll also want to consider the importance of the trait or skill being measured and whether or not it measures what it’s claimed to measure. I’ve written about these issues in the “Statistics for all” series.