“This Test Is Instructionally Useful.” Where’s the Evidence?

Oct 23, 2024

Strong Claims Require Strong Support

Many assessment companies claim their products will improve learning by giving teachers actionable insights to support their instruction. We’ve come to expect these overblown promises from all sorts of organizations, whether they’re marketing cars, pharmaceuticals, or political candidates.

Promises are a type of claim. Claims are statements about what a product, person, or process will do under certain conditions (if specified). Claims are not statements of fact. They are hypotheses that must be evaluated with evidence.

Unfortunately, I see very little evidence to support claims of instructional usefulness. To see what I mean, visit almost any testing company’s website. Their claims are easy to find. The evidence, not so much.

I was thinking about these claims and evidence when Carla Evans and I presented highlights from our book, Understanding Instructionally Useful Assessment, at the NCME Classroom Assessment conference in September. We were thrilled to be joined in this session by our terrific colleagues Kyla McClure, Nathan Dadey, and Lorrie Shepard.

Carla and I set a high bar when we defined an instructionally useful assessment.

An instructionally useful assessment provides substantive insights about student learning strengths and needs relative to specific learning targets that can positively influence the interactions among the teacher, student, and the content (Evans and Marion, 2024, p. 20).

At the conference (and in the book), we described the assessment features and characteristics that can facilitate or hinder appropriate instructional interpretations. We purposefully avoid guarantees of instructional usefulness because teachers play a critical role in interpreting and using assessment results.

Show Me the Evidence!

I don’t live in Missouri, but I agree with its Show Me philosophy. If you claim your assessment is instructionally useful, Carla and I shouldn’t have to serve as the instructional utility police (as tempted as we might be). The purveyors of these claims must provide evidence to support their statements. In other words, show us evidence that teachers, and perhaps others, can derive substantive, actionable insights about student learning from these tests.

What types of evidence would convince us that an assessment is instructionally useful?

Experimental design is the gold standard of research. It isn’t easy to do in education, and it might not provide the types of evidence we would want to support claims of instructional usefulness (e.g., substantive insights). An experiment to evaluate the instructional usefulness of a particular assessment would include these steps:

  • Randomly select a sample of teachers from the population, recognizing that teachers are nested in schools
  • Randomly assign some teachers to the “treatment” group (e.g., they will use the assessment being studied)
  • Randomly assign some teachers to the “control” group (they’d use a different assessment or gauge student learning the way they normally do)
  • Identify a meaningful outcome variable (e.g., growth on the state assessment)
  • Evaluate the difference between the two groups on this outcome measure.

Despite the “gold standard” label, we think this is impractical and will miss key insights. True experiments are notoriously tough to do in education for many logistical and ethical reasons. More importantly, simply looking at a distal outcome would miss the thinking and other processes teachers use to make sense of assessment results.

Cognitive laboratories are also known as think-aloud protocols. As the name suggests, this approach asks participants to verbalize their thinking as they engage in an activity. We do this with students to understand how they engage with test items in the test-development stage. We can gain similar insights from teachers as they interact with student work and assessment score reports.

In these studies, teachers are prompted to interpret the student work or score reports as if they’re talking to another teacher. The interviewer probes for descriptions of specific interpretations the teacher generates from score reports. The researcher also asks teachers to describe their likely instructional actions based on their interpretations.

Classroom observations may have a bad name because of their role in teacher evaluation, but when done by well-trained observers, they can shed light on how teachers make sense of assessment information.

Compared with other methods, classroom observations have a major advantage: they allow coaches and others insight into how teachers interpret and act on formative assessment activities and other informal assessments. Classroom observations also allow us to gather data about the effectiveness of teachers’ actions based on their interpretations of assessment results.

Carefully crafted surveys can provide information about instructional usefulness, but they require questions that pose scenarios, so they elicit evidence of how teachers interpret student work samples and score reports. Surveys that ask teachers if they found the assessment results instructionally useful are not worth the effort.

An advantage of surveys is that researchers can collect data from a representative sample of respondents, allowing for types of generalizations that are difficult to accomplish with smaller-scale studies like cognitive laboratories and observations. But, again, they have to be very thoughtfully designed.

Evaluating Evidence

Once the data are collected, researchers should be able to indicate whether the assessment supported instructionally useful interpretations and actions. For example, teachers’ interpretations and actions could be compared to those of master teachers or other experts to evaluate claims about the potential instructional usefulness of a particular assessment.

However, teachers are not blank slates. They interpret new assessment information in light of what they already know about students in their class and their learning strengths and needs in specific content areas. Therefore, evaluating evidence of instructional usefulness must be contextualized in terms of what teachers already know. If the additional assessment does not provide new insights, users and decision-makers must consider whether it is worth the time to administer an additional assessment, even if it provides substantive insights.

Who Is Responsible for Demonstrating Usefulness?

I’ve laid out several approaches for collecting evidence of instructional usefulness. But who is responsible for this evidence? It’s simple. If you’re making a claim, you’re responsible for providing the evidence. Don’t advertise a claim until you have evidence to support it. Other than for experimental test designs, the types of evidence I described are not difficult to collect.

While test vendors are largely responsible, those making assessment decisions (e.g., district leaders) are also responsible. When shopping for assessments, district leaders and other decision-makers must ask for evidence of instructional usefulness before making a purchasing decision.

Even if there appears to be evidence, district leaders should evaluate the degree to which the assessments support instructional decisions and actions in their context. Instructional usefulness isn’t an on-off switch; it exists on a continuum, and leaders should understand how useful a test is and isn’t. 

Again, the types of studies I described are not difficult to do. And we owe it to teachers to support what they want most: improving their students’ learning.

Share: