Instructionally Useful Assessments: Don’t Tell Us—Show Us
Claims Must Be Supported by Evidence
What does it take for an assessment to support instruction? That sounds like a simple question, but we’ve learned that it requires a multi-faceted response. So much so that we’ve written an entire book about the topic: Understanding Instructionally Useful Assessment.
One of our main reasons for writing it was to address the claims by many assessment company representatives and education leaders about the instructional utility of a vast array of assessment products. When teachers struggle to use the results of these interim and state assessments to improve their students’ learning, they’re accused of not understanding how to use assessments well.
We think teachers are being asked to do the impossible: to squeeze instructional usefulness out of assessments that are not designed to provide that type of information. They’re essentially being asked to make lemonade without the lemons.
Assessment results have many important non-instructional purposes and uses, such as monitoring trends in academic achievement over time or evaluating educational programs. Some would like to claim that evaluating curriculum or professional learning programs is instructionally useful. We agree that these activities are useful, but they are too far removed from day-to-day teaching and learning for us to consider them instructionally useful.
Our definition of instructionally useful assessments (taken from our book) focuses narrowly on improving the interactions among students, teachers, and the content.
An instructionally useful assessment provides substantive insights about student learning strengths and needs relative to specific learning targets that can positively influence the interactions among the teacher, student, and the content… If the assessment doesn’t lead to changes in the interactions between students and teachers that improve student learning, we have difficulty considering the assessment, no matter what it does outside of the classroom, to be instructionally useful (p. 21, 24).
We know that some will complain that our definition is overly restrictive and only fits things like formative assessment practices or rich performance tasks embedded in high-quality instruction materials. Yes, we acknowledge its restrictiveness. But “instructional usefulness” should not be a label awarded or denied based simply on surface features. It is a claim that must be supported or refuted through logic and evidence.
What Evidence Supports Claims of Instructional Utility?
The first step in whether an assessment is instructionally useful is specifying the interpretations and uses that we’d like to support with the results. In fact, such claims are at the heart of validity arguments. An interpretation-and-use argument, as described by Michael Kane (2006, 2013) is a somewhat formal way of outlining the various claims and the evidence necessary to support (or refute) the claims we make about assessment results.
We cannot evaluate the extent to which the evidence supports notions of instructional usefulness unless the claims are specific. They can’t be vague statements like “Teachers can use the results of this assessment to improve learning opportunities for students.” A specific claim sounds more like this: “Teachers should be able to gain insights at a small enough grain size to target skills and knowledge students had not yet grasped, or identify what students need to learn next to increase their likelihood of success in the next unit.”
Kane provided examples of four types of inferences—scoring, generalization, extrapolation, and decisions—that can be used to structure a comprehensive validity argument. We would be happy to see full-blown validity arguments to evaluate claims of instructional utility, but in most cases, well-articulated claims and a clear set of evidence will suffice.
Below are some examples of potential claims and types of evidence that could enable an evaluation of claims of instructional utility.
Potential Claims | Examples of Possible Evidence |
The way the assessment is scored supports instructional decision-making by educators and supports students’ insight into their own learning. | Cognitive laboratories using different scoring models could document the instructional insights from the various models. |
The assessment yields results at a grain size sufficient to inform instructional decisions. | Teacher surveys, classroom observations, and think-aloud protocols with teachers while they review score reports could shed light on this claim. |
Teachers can interpret and use the results from the assessment to appropriately adjust instruction for individual students as well for identifiable student groups. | Classroom observations, video simulations, think-aloud studies, and other direct observations (rather than simply a survey of teachers’ impressions) could provide supporting evidence. |
Again, these are just a few examples. Any assessment provider or sponsor promoting the instructional usefulness of their assessment must provide a detailed set of claims and evidence to support or refute their claims.
More simply: How will teachers be able to use the results of the assessment to directly affect the instruction of their students? If clear and compelling answers cannot be provided to these straightforward questions, assertions of instructional usefulness just set teachers up for failure and add to our current over-testing culture. That’s not fair to teachers or students.