Collaboratively Learning About Through Year Assessments
It Will Take a Village to Fulfill the Opportunities of Through Year Assessment Systems
“Through year” or “through course” assessment systems are rapidly proliferating as an alternative to the single, end-of-year administration of state accountability tests. There are at least ten states and associated assessment companies in various stages of exploration, design, and/or development of these systems. To help those of us at the Center for Assessment and the larger field learn more about the opportunities and challenges associated with through year assessments, we convened a group of assessment experts, state leaders, and industry professionals to wrestle with the multiple opportunities and challenges regarding the design, implementation, and validation of through year assessments.
Given the set of experts we brought together for four webinars in 2021, it’s not surprising that approximately 200 participants each day took the opportunity to engage in deep thinking about this new approach to state summative assessment. I recap below highlights from each of the four webinars and conclude with a brief discussion of some lessons learned.
Session 1: Definitions, Aims, and Use Cases
Nathan Dadey and Brian Gong kicked off the event by offering a definition of through year assessments:
Those assessments administered multiple, distinct times across a school year, designed to support both annual summative determinations of proficiency and at least one additional goal.
Nathan and Brian then described a range of potential aims and use-cases for through year designs, making clear the importance of clearly describing the “problem” intended to address as the first step in designing a system and outlining the evidence necessary to support the claims and assumptions.
We had the chance to learn from Chanda Johnson (Louisiana), Jeremy Heneger (Nebraska), and Laine Bradshaw (Navvy, GA) about the issues they were trying to address and the ways they were trying to solve them. Convening participants quickly learned there is no prototypical through year design, as a result of the clear descriptions that Laine, Jeremy, and Chanda provided of their very different designs tailored to their unique contexts and purposes.
Session 2: Claims, Designs, and Evidence
There are many claims and potential inferences that must be evaluated in any through year design. Nathan and Brian provided a detailed description of how theories of action help to outline the intended claims the assessment program is intended to support. They connected the development of a theory of action to an interpretation and use argument (IUA, Kane, 2013), to structure the evaluation of the inferences from the assessment scores. The session focused mostly on claims related to instructional improvement as a result of distributing test results to teachers and administrators multiple times each year. We were fortunate to have several experts in instruction and teacher learning join us for this webinar.
Courtney Bell (Wisconsin Center for Educational Research) and Leslie Nabors-Olah (ETS) are both experts in evaluating instruction. They provided deep insights into the types of professional learning opportunities and interventions necessary to bring about meaningful changes in instruction and the types of evidence necessary to collect to evaluate such changes. Karen Barton (NWEA) outlined the intended effects of the systems they are designing and the types of evidence they are planning to collect. A fascinating discussion ensued, and the chat was popping with such luminaries as Jim Pellegrino (University of Illinois-Chicago) and Randy Bennett (ETS) offering their observations.
Session 3: Technical and Logistical Issues
The third session started with an audience exercise to have participants identify their biggest technical or logistical worries regarding through year assessment systems. Participants indicated that “scores and interpretations,” “unintended consequences,” “comparability,” “aggregating multiple measures,” and “teacher buy-in” were the most common worries, but there were many other interesting responses such as those focused on “losing the formative uses of the results.” Will Lorié and Nathan Dadey then led the participants in a discussion of five big issues we’ve been worried about:
- Aggregation
- Alignment
- Field-testing
- Standard setting (establishing cutscores)
- Reporting
Following this introduction, Will teed up Laine Bradshaw (Navvy, GA), Garron Gianopulos (NWEA, GA & NE), and Ye Tong (Pearson) to identify one or two issues that they worry about, and to describe how they are trying to solve these issues. They highlighted issues such as maintaining large item pools through field testing, alignment, producing a summative score, and accommodations and accessibility. Again, we’re grateful for the opportunity to learn from those closest to the work.
Session 4: Threading the Needle
We closed the convening with a highly interactive and pragmatic session. We had eight amazing panelists—Ye Tong, Allison Timberlake (Georgia DOE), Garron Gianopulos, Meagan Karvonen (Atlas, University of Kansas), Jeremy Heneger, Brian Gong, Will Lorié, and Nathan Dadey—each highlighting a particular issue or two and offering suggestions for how to address the issue. This fascinating discussion weaved in critical issues of accountability demands and unintended negative consequences, as well as the challenges of meeting potentially unrealistic hopes from district and school leaders.
Closing Thoughts
This convening would not have been possible without the intense efforts, commitment and deep thinking that my colleagues Nathan Dadey, Brian Gong, and Will Lorié put into organizing the event, and the expertise of the terrific panelists who joined us. Finally, I’m grateful to the hundreds who participated during each session. Chris Domaleski commented that the chat reflected the contributions of a who’s who of educational assessment.
I could write an entire post on all the things I’ve learned, but I’ll close with two thoughts that stuck in my mind. Coming into the event, we thought users and developers would be designing systems to meet both accountability and instructional goals, but with a need to favor accountability functions given the federal requirements. We have spent a lot of time obsessing about how to aggregate the results of the multiple assessments most validly into a single, summative annual determination. Read Brian Gong’s posts on this topic for more detail.
However, many participants prioritized instructional uses while trying to make the accountability fit within the constraints imposed by the learning goals. It turns out that many (not all) of the designs presented at the convening got around the accountability issue by placing essentially all of the summative determination weight on the end-of-year assessment. Some were using the through year components to provide some prior information to hopefully make the last test more efficient. These approaches get around some of the thorny issues of dealing with knowledge and skills that develop over time, as well as the practical issue of dealing with missing values from students who do not participate in all of the test administrations throughout the year. Of course, we need to ask then, how different is this approach from the current system?
This convening revealed the importance of bringing together professionals with varying perspectives and positions to make headway on a still-developing problem of practice. We know we have a lot more to learn about whether through year assessment can meet the intended goals of policymakers and educators, but this convening was a good start to help us understand the range of issues we all need to wrestle with in the coming years. Finally, I promise to reveal, in a forthcoming post, my ideal design for a distributed assessment system. Stay tuned to CenterLine!