Pop quiz: AI matches human performance at developing good test questions

May 18, 2023 Staff 4-min. read

Image shows an empty university classroom.

Researchers have developed an artificial intelligence (AI) model that can generate online course assessment questions that instructors found indistinguishable from questions written by humans.

The new AI is called QUADL, and it does two things: it identifies key terms and ideas in instructional texts, and then crafts questions that focus on those terms and ideas.

“We provide QUADL with the courseware contents and the learning objectives for the curriculum, and QUADL can then develop questions that help students achieve those learning objectives,” says Noboru Matsuda, associate professor of computer science at North Carolina State University and co-author of a paper on the work.

“Humans are good at developing courses, but in interviews with instructors and courseware developers, we found that they often struggle to develop questions that are effective at assessing student progress on the learning objectives for those courses,” says Machi Shimmei, a Ph.D. student at NC State and first author of the paper. “Our study suggests QUADL can be a useful tool for instructors and course developers.”

To test QUADL’s performance, the researchers used existing online courseware called the Open Learning Initiative. The researchers recruited five instructors who use the OLI for their classes and asked them to evaluate a lengthy list of questions. Some of the questions were generated by QUADL; some were generated by the current state-of-the-art question-generating AI model (called Info-HCVAE); and some of the questions were already in use in the OLI courses. Study participants were not told where the questions came from, and were asked to assess the pedagogical value of each question.

“The pedagogical value scores given to questions generated by QUADL were essentially identical to the value scores that instructors gave to questions written by people for use in the OLI,” Shimmei says. “The questions generated by Info-HCVAE received lower scores from the instructors.”

The researchers are now planning undergraduate classroom studies that will ask instructors to use questions generated by QUADL in order to see how, if at all, questions generated by QUADL affect student learning.

“This forthcoming work should close the loop for this technology,” Matsuda says. “Hypothetically, QUADL will work. Now we have to see if it actually will work in practice.”

QUADL is part of a larger suite of AI technologies that Matsuda and his collaborators are developing called PASTEL. All of the PASTEL technologies are designed to facilitate the development of educational courseware.

“These technologies deal with everything from generating questions – which is QUADL’s role – to quality assurance functions used to assess how effective each element of the courseware is at helping students learn,” Matsuda says. “We are looking for both research partners to help us develop these generative AI technologies, and for partners who are educators interested in using these AI tools in their courses.”

The paper, “Machine-Generated Questions Attract Instructors when Acquainted with Learning Objectives,” will be presented at the 24th International Conference on Artificial Intelligence in Education (AIED 2023), which will be held July 3-7 in Tokyo, Japan. The paper was co-authored by Norman Bier of Carnegie Mellon University.

This research was done with support from the National Science Foundation, under grants 2016966 and 1623702.

-shipman-

Note to Editors: The study abstract follows.

“Machine-Generated Questions Attract Instructors when Acquainted with Learning Objectives”

Authors: Machi Shimmei and Noboru Matsuda, North Carolina State University; Norman Bier, Carnegie Mellon University

Presented: July 3-7, AIED 2023, Tokyo, Japan

Abstract: Answering questions is an essential learning activity on online courseware. It has been shown that merely answering questions facilitates learning. However, generating pedagogically effective questions is challenging. Although there have been studies on automated question generation, the primary research concern thus far is about if and how those question generation techniques can generate answerable questions and their anticipated effectiveness. We propose QUADL, a pragmatic method for generating questions that are aligned with specific learning objectives. The QUADL method consists of two parts: (1) The answer prediction model that identifies a key term, if any, in a given sentence that has an instructional relation with the given learning objective, and (2) the question conversion model that converts the given sentence into a question for which the predicted key term becomes an answer. We applied QUADL to an existing online course and conducted an evaluation study with in-service instructors. The results showed that questions generated by QUADL were evaluated as on-par with human-generated questions in terms of their relevance to the learning objectives. The instructors also expressed that they would be equally likely to adapt QUADL-generated questions to their course as they would human-generated questions. The results further showed that QUADL-generated questions were better than those generated by a state-of-the-art question generation model that generates questions without taking learning objectives into account.

This post was originally published in NC State News.

Tags:

Pop quiz: AI matches human performance at developing good test questions

More News

‘Sunny Day Flooding’ increases fecal contamination of coastal waters

NC State researchers use machine learning to create a fabric-based touch sensor

Using dyes to research degenerative diseases