From the archives: The limitations of testing
Originally posted on March 15, 2010
During the first 10 years of my work developing Lectical Assessments, I spent a fair bit of time explaining how testing works. Gradually, I came to understand that the science behind our assessments was less important to clients than their practical benefits, so I shifted my focus away from academics toward practical application. Recently, a question about testing from a colleague sent me on a search through the archives of my original blog. Realizing that these articles contained important information that might be useful to at least a few of my readers, I decided to rescue some of them. This article on the limitations of testing was motivated by concerns about the overuse and misuse of test scores.
It is important for those of us who use assessments to ensure that they (1) measure what we say they measure, (2) measure it reliably enough to justify claimed distinctions between and within persons, and (3) are used responsibly. It’s relatively easy for testing experts to create assessments that are adequately reliable for individual assessment, and although it is more difficult to show that these tests measure the construct of interest, there are reasonable methods for showing that an assessment meets this standard. However, it is more difficult to ensure that assessments are used responsibly.
Few consumers of tests are aware of their inherent limitations. Even the best tests, those that are highly reliable and measure what they are supposed to measure, provide only a limited amount of information. This is true of all measures. The more we hone in on a measurable dimension — in other words, the greater our precision becomes — the narrower the construct becomes. Time, weight, height, and distance are all extremely narrow constructs. This means that they provide a very specific piece of information extremely well. When we use a ruler, we can have great confidence in the measurement we make, down to very small lengths (depending on the ruler, of course). No one doubts the great advantages of this kind of precision. But we can’t learn anything else about the measured object. Its length usually cannot tell us what the object is, how it is shaped, its color, its use, its weight, how it feels, how attractive it is, or how useful it is. We only know how long it is. To provide an accurate account of the thing that was measured, we need to know many more things about it, and we need to construct a narrative that brings these things together in a meaningful way.
A really good psychological measure is similar. The Lectical Assessment System, for example, is designed to go to the heart of development, stripping away everything that does not contribute to the pure developmental “height” (complexity level) of a given set of assessment responses. Without additional information about things like (1) the ways of thinking that are generally associated with complexity level in a particular skill area, (2) the specific ideas that are associated with a set of responses, and (3) observations about the quality of skills demonstrated in those responses, we cannot construct a terribly useful narrative.
And this brings me to my final point: A formal measure, no matter how great it is, should always be employed by a knowledgeable mentor, clinician, teacher, consultant, or coach as a single item of information about a given person that may or may not provide useful insights into relevant needs or capabilities. Consider this relatively simple example: a particular 2-year-old may be tall for his age, but if he is also somewhat underweight for his age, the latter measure may seem more important. However, if he has a broken arm, neither measure may loom large — at least until the bone is set. Once the arm is safely in a cast, all three pieces of information — weight, height, and broken arm — may contribute to a clinical diagnosis that would have been difficult or impossible to make without any one of them.
It is my hope that the educational community will choose to adopt high standards for measurement, then put measurement in its place — alongside good clinical judgment, reflective life experience, qualitative observations, and honest feedback from trusted others.