Test Score Meaning Under Remote Test Administration (Part 2)
Mode or Accommodation? A Framework for Thinking about How Remote Administration Impacts Test Score Meaning
This is the second of two posts on planning for the examination of the validity of scores collected through remote test administration. In the first post, Michelle Boyer and Leslie Keng laid out the reasons why states should be concerned about the effect of remote testing on the comparability of test score meaning.
With the format of instruction – in-person, hybrid, remote – at any given school in question for the upcoming 2021 Spring semester, the possibility of remotely administering federally-mandated state summative assessments is becoming all too real. With many school districts offering only remote instruction, and others who are offering hybrid loath to give up any in-person time for state assessment, the pressure to implement a remote testing solution is only likely to grow as the school year passes. Such an administration solution will need to be flexible – not only because instructional formats can shift rapidly between in-person, hybrid, and remote learning as schools respond to COVID-19 outbreaks, but also because any given school may be supporting a variety of instructional approaches (e.g., hybrid for some students, fully remote for others). This post offers a framework for interpreting test score meaning for states planning for remote administration.
Adapting What We Know to These New Learning Environments
There is virtually no precedent for administering state summative assessments remotely. Almost all of the research, and work, to date on remote assessment has focused on higher education and certification and licensure (e.g., Langenfeld, 2020), with the exception of the relatively small-scale Advanced Placement administration in Spring of 2020, which was met with “both controversy and some technical difficulties” (Camara, 2020, p. 4). The applicability of work from higher education and certification and licensure appears limited, as it suggests approaches that may not be amenable to state summative assessments, including live video proctoring and single day administration (Langenfeld, 2020; Isbell & Kremmel, 2020; Steger, Schroeders & Gnambs, 2020). In addition, the number of tested students has been typically much smaller in higher education and certification and licensure than is commonly found in statewide assessment.
In considering these complexities, those thinking about administering state summative assessments remotely may find it useful to think about framing remote administration as a mode of administration or an accommodation. Restated, in what ways should remote administration be treated like a mode of administration like pencil-and-paper or computer-based testing? And in what ways should remote administration be treated like an accommodation, to be given to students who need it in order to fully access the content (e.g., like an extended time accommodation)? Each perspective comes with its own key questions, empirical analyses, and assumptions about who is eligible for remote administration.
Remote Administration in Terms of Mode of Administration
Framing remote administration in terms of a mode of administration places remote administration on equal footing as other modes of administration, such as school paper-based administration and computer-based administration. In doing so, the key question becomes whether “direct evidence of score interchangeability” (Wang, Jiao, Young, Brooks & Olson, 2007, p. 220) can be obtained – meaning that so-called mode effects are absent. A mode effect refers to “differential examinee performance that can occur due to differences in the presentation” across modes (ibid). Therefore, consideration of remote testing as a mode is aimed at minimizing mode effects, as well as determining whether such effects occurred through post hoc analysis. Thus, empirical analyses are aimed at determining whether mode effects occur, generally through a comparison across assessment modes. The minimization of mode effects then allows the scores from these varying modes to be used interchangeably.
Summaries of mode effects between in-school paper-based and computer-based administrations have found mode effects to be generally small, on average (e.g., Kingston, 2008; TEA, 2008; Wang et al., 2007; Wang, Jiao, Young, Brooks, Olson, 2008), but this average is based on some studies that have sizable mode effects. Whether mode effects for remote administration would be similarly small with some idiosyncratic exceptions remains unknown – a troubling state of affairs for those trying to decide now about what modes of administration to support.
The emphasis on mode effects corresponds with the way in which assessment modes are offered. Generally, modes are provided widely and without reservation. In addition, the mode of administration is typically the same for an entire school or district. In contrast, while accommodations are offered only to eligible students, modes of assessment are often widely spread throughout a state – although one mode is often favored (e.g., in-school computer-based assessment).
Remote Administration in Terms of an Accommodation
Framing remote administration in terms of an accommodation puts it on equal footing with accommodations like read-aloud and extended time – which is not to say that remote administration would follow the eligibility guidelines for accommodations and be provided to students based on their inclusion in groups like students with disabilities and English language learners. Instead, treating remote administration like an accommodation means there would be clear criteria that define what conditions must exist in order for students who test remotely to have fair access to the tested content.
Thus, the state would need to lay out the criteria upon which students would be eligible for remote administration, and decisions would likely be made on a case-by-case basis. In doing so, the key question becomes whether remote administration introduces too much construct irrelevant variance for eligible students. The impact of such construct irrelevant variance may be difficult to define, as there are construct irrelevant factors associated with remote testing that could have a negative impact on scores (e.g., poor testing environment, problematic internet access, less than ideal devices) and factors that could have a positive impact on scores (e.g., cheating, well-intentioned help from parents). Post hoc empirical analyses, perhaps patterned on so-called “differential boost” analyses (e.g., Sireci, Scarpati, Li, 2005), could help tease out whether student scores are systematically impacted by remote administration.
Planning for Remote Administration
Likely, any given state’s approach will draw from both framings, treating remote administration as a mode in some ways and as an accommodation in others. From the mode perspective, a state implementing remote administration should:
- Develop processes and procedures that accompany a new mode of assessment. For remote administration, these processes and procedures include ensuring that the platform works – and works across the multitude of devices students may use, providing students with the opportunity to familiarize themselves with the platform, implementing proctoring, defining assessment windows, ensuring students can log on and take the test, and that students are supported while they take the test.
- Define the conditions under which scores from a remote test administration can be used interchangeably (or, more generally, whether scores will be used interchangeably) and work to ensure that these conditions are met.
- Plan and conduct analyses to determine whether a mode effect exists, and if so, what the state response will be if mode effects are discovered (e.g., will mode adjustments be made?).
From the accommodations perspective, a state implementing remote administration should:
- Consider what students are eligible for online testing, be it a decision based on individual need or the instructional formats selected by schools or districts.
- Establish criteria for minimizing the introduction of construct irrelevant variance, including internet access, an appropriate device, and a conducive testing environment.
In summary, choices made about the availability and implementation of a remote administration option will push a state’s approach toward one of these two framings. The support required to implement a successful remote solution is likely to be substantial, so those considering remote test administration should begin planning now. Drawing from current practice on modes and accommodations, identifying where current practice falls short, and working to implement solutions where practice falls short will be key in ensuring that a remote solution is successful.