No Escape from Artificial Intelligence in Education
How AI is Reshaping Knowledge Work and Professional Learning
Artificial intelligence (AI) is on many people’s minds in one form or another. We see this when we read the news, listen to podcasts and peruse social media; when we use everyday tools like ChatGPT or SmartCompose/SmartReply in Gmail; and when we talk to friends, family, and colleagues.
So it’s not a surprise that AI was an inescapable presence at two of the main conferences in the educational assessment field this year: the National Council on Measurement in Education (NCME) and the National Conference on Student Assessment (NCSA). NCME featured 15 sessions on AI—that’s on par with sessions on differential functioning and more than sessions on equating—and NCSA included six sessions.
For more on how AI is reshaping assessment and accountability, see our blog posts here and here.
NCME also has a dedicated special-interest group (SIG) on AI, which recently released an excellent paper on the connection between AI and educational measurement. AI-related issues also undoubtedly infuse discussion in related SIGs, such as Big Data or Educators of Measurement. NCSA dedicated its only post-conference session to AI: “Unlocking Potential: Harnessing AI for Inclusive Learning.”
Here are a few themes that emerged for me from these two conferences.
Clearer Definitions, Rapid Development
With the field’s increasing sophistication comes a finer-grained understanding of the distinctions among subfields. I noticed that people are more sharply defining the spaces they are working in, particularly in distinguishing generative AI from AI generally, and from other methodological approaches such as machine learning, deeper learning, and automated scoring. This TikTok graphic captures the relationships among different disciplinary areas.
Many organizations use some kind of automation in their processes; for example, they may use natural language processing tools to perform automated scoring actions on speech or writing. Far fewer are using AI tools to generate content such as text, images, videos, code, 3D objects, or 2D schematics. Assessment applications of AI can also involve the generation of items or task materials, question prompts, feedback, first drafts, and interpretational guidance. These kinds of AI represent the more challenging territory we’re entering.
Development-oriented research on the utility and performance of AI is proliferating at lightning speed. At the conferences, colleagues I talked with described managing this speed by reorganizing their knowledge work; to parse emerging research, they’re using AI tools from open-source platforms or creating new ones that serve their needs.
An Abundance of Frameworks
Frameworks for AI are similarly proliferating. Colleagues at the American Institutes for Research and the National Center for Education Statistics, for example, are working on a multi-layered AI framework for educational assessment that they presented in an NCSA session entitled “Understand and Leverage Generative AI for Inclusive Educational Assessment.”
They noted that we’ve arrived at a place where there is an abundance of available frameworks: they had reviewed 20-plus national and international frameworks while creating theirs. Each framework foregrounds some aspects more than others. For example, some are focused on AI literacy generally, while others examine AI governance, AI competencies for students and teachers, or AI in educational learning platforms.
Similarly, SmarterBalanced and IBM are developing a framework for the responsible use of AI to support learning and assessment. (I am participating in that project, along with many other professionals in the field.) The emerging framework emphasizes considerations of equity, fairness, inclusion, and minimization of bias, and is designed to help analyze and build out educational use cases for K-12 learners.
These ideas about responsible use are on the minds of many people in education; a quick scan of session titles at the conferences underscores this. That’s a good thing, because we can’t have thoughtful conversations about AI without addressing them. These conversations will undoubtedly also shape the much-needed revision of the Standards for Educational and Psychological Testing that has recently begun.
Many Empirical Research Projects Unfolding
As complex technologies emerge, people are working on many problems that run in parallel —development and evaluation of algorithms, bias evaluations, technological integration into platforms—and creating thoughtful visions for the future of education that go beyond the efficiency considerations that often drive this work.
Complete end-to-end applications of AI and generative AI, for all cycles of assessment development, with humans in the loop, do not exist—at least not yet. Perhaps unsurprisingly, the most promising use cases come from educational learning providers like Duolingo, KhanAcademy.
Duolingo, for instance, is already using generative (and non-generative) AI for tasks such as automated prompt/item development, automated scoring and calibration, adaptive administration, and plagiarism detection. They are experimenting with new types of multimodal, interactive tasks, AI-assisted scaffolding and prompting, and new test assembly methods. Listen to this keynote to hear more.
These organizations operate at very large (national or global) implementation scales with a wealth of user behavior data that, at a minimum, allows them to evaluate common technical and ethical issues of generative AI in systematic ways (see NCME session 032 as an example).
That data also allows them to empirically evaluate several common risks related to fairness and bias, for instance. Associated rigorous, peer-reviewed work is being presented at technical convenings as well as published in related proceedings and independent journals (see NCME sessions 054 or 129 as examples).
In short, while it is important to have frameworks to thoughtfully think through risks, leading researchers are dedicated to finding rigorous, empirically grounded answers. These are increasingly becoming available to anchor the conceptual considerations that frameworks pose in empirical reality and support informed discussions.
I saw this clearly when I attended conference sessions with leading thinkers in the field who offered scientific, nuanced arguments for the pros and cons of generative AI (see NCME session 116 as an example).
Signaling Through RFIs and RFPs
State education leaders are also beginning to send important signals of interest in exploring, in a serious and sustainable manner, AI-supported innovations. You can see this in their requests for information or interest (RFIs) and their requests for proposals (RFPs).
For example, in a pair of recent procurements (D24-153 and D24-023), the Hawaii state department of education seeks to enhance state assessment programs by using AI to create virtual students—and eventually teachers and other stakeholders—for classroom assessment. The department uses the phrase “AI Assessment Community” to describe its vision.
Frameworks like the ones I described earlier will be helpful in the work to guide strategic planning for evaluating the implementation of AI projects in fair, inclusive, and equitable ways.
(Speaking of RFIs and RFPs, in a fun twist, the National Science Foundation recently issued cautionary guidelines for how generative AI can be used to create text to respond to and evaluate proposals in response to RFPs—and maybe even write the RFPs as well.)
Building Capacity and Coherent AI Strategies
As all of these developments make clear, AI is here to stay and will take up notable conceptual and practical space in the work of state agencies and school districts. This has important implications for the creation of coherent AI strategies and data strategies supported by coherent approaches to data engineering, architecture, governance, and analytics.
It also requires the creation of effective data cultures that lead to new professional roles, such as data champion. In this regard, state leaders from Illinois, Rhode Island, and Michigan shared their experiences with us in a session we led at NCSA: “How to Be an Effective Data Champion”; check out the associated materials and resources in the online NCSA program.
State departments and school districts will need to invest more time and resources in these strategic efforts by increasing opportunities to learn from one another, from other districts or states, and, perhaps most importantly, the best minds in the field. Some might hire staff with dedicated roles in AI leadership who can keep up with advances in related fields and advise internally.