What every buyer should know about forms of assessment
In this post, I describe and compare three basic forms of assessment — surveys, tests of factual and procedural knowledge, and performative tests.
Surveys — measures of perception, preference, or opinion
What is a survey? A survey (a.k.a. inventory) is an assessment that asks a test-taker to choose from a set of options, such as “strongly agree” or “strongly disagree”, based on opinion, preference, or perception. Surveys can be used by organizations in several ways. For example, opinion surveys can help maintain employee satisfaction by providing a “safe” way to express dissatisfaction before workplace problems have a chance to escalate.
Surveys have been used by organizations in a variety of ways. Just about everyone who’s worked for a large organization has completed a personality inventory as part of a team-building exercise. The results stimulate lots of water cooler discussions about which “type” or “color” employees are, but their impact on employee performance is unclear. (Fair warning: I’m notorious for my discomfort with typologies!) Some personality inventories are even used in high stakes hiring and promotion decisions, a practice that continues despite evidence that they are very poor predictors of employee success [1].
Although most survey developers don’t pretend their assessments measure competence, many do. The item on the left was used in a survey with the words “management skills” in its title.
Claims that surveys measure competence are most common when “malleable traits” — traits that are subject to change, learning or growth — are targeted. One example of a malleable trait is “EQ” or “emotional intelligence”. EQ is portrayed as a skill that can be developed, and there are several surveys that purport to measure its development. What these surveys actually measure has nothing to do with ability.
Another example of surveys masquerading as assessments of skill is in the measurement of “transformational learning”. Transformational learning is defined as a learning experience that fundamentally changes the way a person understands something, yet the only way it appears to be measured is with surveys. Transformational learning surveys measure people’s perceptions of their learning experience, not how much they are actually changed by it.
The only survey-type assessments that can be said to measure something like skill are assessments like 360s that ask people about their perceptions of other people’s skills. Although 360s inadvertently measure other things, like how much a person is liked or whether or not a respondent agrees with that person, they may also document evidence of behavior change. If what you are interested in is behavior change, a 360 may be appropriate in some cases, but it’s important to keep in mind that while a 360 may measure change in a target’s behavior, it’s also likely to measure change in a respondent’s attitude that’s unrelated to the target’s behavior.
For example, if a 360 survey is conducted prior to a leadership program that valorizes accessibility, participants may rate other employees as less accessible following the intervention than they did prior to the intervention, even if their behavior has changed in the right direction. Why? Because their expectations were raised by the leadership program.
360-type assessments may, to some extent, serve as tests of competence, because behavior change may be an indication that someone has learned new skills. When an assessment measures something that might be an indicator of something else, it is said to measure a proxy. A good 360 may measure a proxy (perceptions of behavior) for a skill (competence).
There are literally hundreds of research articles that document the limitations of surveys, but I’ll mention only one more of them here: All of the survey types I’ve discussed are vulnerable to “gaming” — people can easily figure out what the most desirable answers are.
Surveys are extremely popular today because, relative to assessments of skill, they are inexpensive to develop and cost almost nothing to administer. Organizations spend millions of dollars every year on surveys, many of which are falsely marketed as assessments of skill or competence.
Tests of factual and procedural knowledge
A test of competence is any test that asks the test taker to demonstrate a skill. Tests of factual and procedural knowledge can legitimately be thought of as tests of competence.
The classic multiple-choice test examines factual knowledge, procedural knowledge, and basic comprehension. If you want to know if someone knows the rules, which formulas to apply, the steps in a process, or the vocabulary of a field, a multiple-choice test may meet your needs. Often, the developers of multiple-choice tests claim that their assessments measure understanding, reasoning, or critical thinking. This is because some multiple-choice tests measure skills that are assumed to be proxies for skills like understanding, reasoning, and critical thinking. They are not direct tests of these skills.
Multiple-choice tests are widely used because there is a large industry devoted to making them. However, they are increasingly unpopular because of their (mis)use as high stakes assessments. They are often perceived as threatening and unfair because they are primarily used to rank or select people, and are not helpful to the individual learner. Moreover, their relevance is often brought into question because they don’t directly measure what we really care about — the ability to apply knowledge and skills in real-life contexts. Multiple-choice items are proxies.
Performative tests
Tests that ask people to directly demonstrate their skills in (1) the real world, (2) real-world simulations, or (3) as they are applied to real-world scenarios are called performative tests. These tests usually do not have “right” answers. Instead, they employ objective criteria to evaluate performances for the level of skill demonstrated, and often play a formative role by providing feedback designed to improve performance or understanding. This is the kind of assessment you want if what you care about is deep understanding, reasoning skills, or performance in real-world contexts.
High-quality performative tests are the most difficult tests to make, but they are the gold standard if what you want to know is the level of competence a person is likely to demonstrate in real-world conditions — and if you’re interested in supporting development. Standardized performative tests are not yet widely used, because the methods and technology required to develop them are relatively new, and there is not yet a large industry devoted to making them. But they are increasingly popular because many of them can be used to support learning.
Performative tests can be used to measure correctness or competence. An interview or written response test that is scored with a rubric focused on whether or not particular elements are present or presented accurately, that test is a test of correctness. If its rubrics focus on level of skill, it is a test of competence.
Unfortunately, performative tests may initially be perceived as threatening because people’s attitudes toward tests of knowledge and skill have been shaped by their exposure to high stakes multiple-choice tests. The idea of testing for learning is taking hold, but changing the way people think about something as ubiquitous as testing is an ongoing challenge.
In addition to choosing assessments that measure what we intend to measure, we should also make sure they have a validity and reliability profile that is adequate for the context in which they will be used. See the Statistics for All series for information about interpreting reliability and validity statistics.
Lectical Assessments
Lectical Assessments are performative tests — tests for learning. They are designed to support robust learning — the kind of learning that optimizes the growth of essential real-world skills. We’re the unchallenged leader of the pack when it comes to the sophistication of our models, methods, technology, and evidence base.
[1] Frederick P. Morgeson, et al. (2007) Are we getting fooled again? Coming to terms with limitations in the use of personality tests for personnel selection, Personnel Psychology, 60, 1029–1033.