Assessment Data Masquerading as Instructionally Useful Information
New Book Explores “Blizzard” of Testing & Its Misuse
For teachers, assessment data is like a blizzard: They’re inundated with a blinding swirl of information from the assessments they’re required to give each year, such as interims or state tests. The makers of these tests often claim they can provide “instructionally useful” information. But some assessment data just masquerades as instructionally useful.
That’s why Scott Marion and I have written a new book on this topic: Understanding Instructionally Useful Assessment. We use that blizzard metaphor to illustrate the confusion about which tests really provide instructionally useful information, and how this confusion affects teachers.
Not all assessments can provide instructionally useful information, but this often isn’t the message teachers get from many test vendors and education leaders. The reality is that assessments’ instructional utility depends on how they are designed, administered, scored, and reported to teachers.
Classroom teachers, for example, create activities and tasks to elicit evidence of students’ learning strengths and needs. This informal assessment information is produced every day as teachers walk around the classroom observing student interactions, listening to collaborative conversations, or making sense of student work products/responses. Teachers also give students final exams, end-of-unit assessments, and even sometimes mid-unit quizzes to gauge the level of student understanding.
Outside Tests Push Classroom Assessment to the Sidelines
These types of informal to more formal assessments are typically under the control of the classroom teacher. We support the use of these teacher-created assessments; they clearly yield information that helps teachers adjust their instruction. But the dominance and stakes of external assessments—those outside the control of the classroom educator—can push the instructionally valuable classroom assessments to the sidelines.
Why do we care so much? Because teachers are understandably confused about how to use quantitative test results to inform their instruction, even though—ironically—students are being over-tested in a quest to make sure every student succeeds.
This over-testing occurs because of misguided belief in the instructional utility of all assessments. The result is unbalanced systems of assessment where precious instructional time is spent testing students on material they have not yet been taught or encouraging practices we know are not good for students (such as test prep activities, narrowing the curriculum and reteaching of procedural knowledge and skills).
We’ve identified 10 assessment design and implementation features that work together to support more instructionally useful assessment. That doesn’t mean that if a test includes all 10, it automatically yields instructionally useful information. Many mediating factors can interfere with instructional usefulness, such as adding high-stakes consequences based on student test scores (accountability). But these are the assessment features to notice and interrogate when trying to gauge a test’s instructional utility.
- Cognitive complexity & associated item type
- Coherence with the enacted curriculum
- Breadth of content standards sampled & resulting grain size of results
- Type of results produced
- Timing of results
- Administration & scoring conditions
- Allowable student responses
- Student choice
- Collaboration
- Real-world & culturally relevant connections
Key Features of Instructionally Useful Tests
Educators have been asking key questions about how to understand instructionally useful assessment; that’s why we felt it was important to conceptualize instructional usefulness and identify the 10 assessment design and implementation features that influence instructional usefulness. These 10 assessment features are present to some degree in every academic assessment—whether informal and flexible or more formal and standardized—but they can be turned up or turned down like the volume on a radio.
For example, one assessment feature is how closely tied the assessment is to the enacted curriculum: Is the assessment embedded in the curriculum, or is the assessment designed so that it doesn’t matter what curriculum is implemented?
The connection between the curriculum and the assessment is important to instructional usefulness because teachers often find it difficult to interpret assessment results when they are external to the scope and sequence of their enacted curriculum. Such tests may even use different terminology and ways of asking students to solve problems, respond to texts, or analyze data than they are used to in their instruction.
Additionally, the breadth of the content standards sampled by the assessment can vary from an entire year of content standards to a cluster of related content standards (e.g., fractions) or even just one standard or one part of a standard. The amount of content tested on an assessment is important for instructional usefulness because teachers do not teach an entire year of standards in one day, one week, or one month. It is difficult for teachers to make sense of assessment results that cover too many standards, because they aren’t sure where to focus in their instructional interventions or differentiation strategies.
What Types of Results Does a Test Provide?
The type of results an assessment produces is also critical to supporting claims about instructional usefulness. For example, most state and commercial interim assessments provide quantitative reports to teachers, but as assessment expert Bob Linn said in 1983: “By itself, a test score does little to identify the nature of a problem, only that there is one.”A test score doesn’t provide qualitative and substantive insights into student strengths and learning needs that can be used to adjust future instruction. Rather, a test score just sends a signal that a student is making sufficient progress or there is an issue that needs to be addressed.
That isn’t very helpful to a classroom educator looking to adjust or differentiate their instruction for specific students without having to do additional assessments to figure out the nature of the problem. A test score may be useful for other purposes—monitoring the quality of educational programs and systems over time, for instance—but not useful for informing instruction.
In this blog post, we’ve focused heavily on an assessment’s design and implementation features. But instructional utility also depends on a test’s purpose and use. The 10 design and implementation features play out differently in different types of assessments, such as formative assessment processes, classroom, school, and district summative and interim assessments, and state assessments. These factors, too, influence a test’s instructional utility.
Our hope is that the book equips classroom teachers, school and district leaders, and policymakers with the knowledge and understanding they need to move away from or advocate against data-rich but information-poor systems of assessment. We want to help educators, broadly speaking, dig out of the blizzard. As we say in the book:
“Armed with this knowledge, educators will be more able to remove unnecessary and redundant assessments from their system, advocate against inappropriate uses of assessments in schools, and use assessment information to support student learning.”
We hope that our book will serve as a snowplow to help clear a path for productive assessment interpretation and use.