Computerised Adaptive Testing in the age of AI

Computerised Adaptive Testing (CAT) and Content Adaptive Progress Tests (CAPT) are two distinct approaches to adaptive assessment. CAT, a more established method, adjusts question difficulty based on a test taker's performance in real-time. By calibrating assessment items (questions) using lots of historical psychometric and performance data, and then using an algorithm to select questions with a 50/50 chance of correct response, CAT, pretty efficiently, determines a test taker's ability level. Consequently, fewer questions are needed to achieve reliable results compared to traditional tests.

CAPT, a more recent development, focuses on ensuring mastery of specific content areas. Questions are repeatedly presented until a test taker demonstrates proficiency in all required domains or topics of the assessments. This method is particularly suited for longitudinal assessment, such as progress testing in schools or healthcare education, where tracking knowledge acquisition over time is crucial. By adapting to the content areas where a test taker struggles, CAPT supports learning and development effectively.

Requirements for good adaptive testing

Developing and deploying a successful CAT requires a combination of technical expertise, psychometric knowledge, and careful planning. Here are the key requirements:

Psychometric Foundations

Defining the content and construct to be measured is crucial for building a valid and reliable test. This includes selecting topics or domain areas to be tested, the cognitive level of the test - Bloom's Taxonomy is often used to categorise questions into levels such as knowledge, comprehension, application, analysis, synthesis, and evaluation - and thinking about the types and distribution of the items (questions) themselves. Below are some other factors that also need careful consideration:

  • Item Development and Calibration: A high-quality item bank with items calibrated using Item Response Theory (IRT) models is essential.
  • Item Selection Algorithm: Selecting appropriate items based on the test taker's ability level requires a robust algorithm.
  • Ability Estimation: Accurately estimating the test taker's ability level after each item response is critical.
  • Test Termination Rule: Determining when to end the test based on desired precision or time constraints is necessary.

Technical Infrastructure

As the name suggests, computerised adaptive testing needs digital capabilities such as

  • Item Delivery System: A platform to present items to test-takers efficiently and securely.
  • Data Management System: A system to store and manage test-taker data, item parameters, and test results.
  • Adaptive Algorithm Implementation: Coding the CAT algorithm to interact with the item delivery and data management systems.
  • Security and Privacy: Protecting test taker data and maintaining test security is paramount.
  • Scalability: The system should be able to handle a large number of test takers simultaneously.

It’s also crucial to think about some additional considerations for the safe and ethical deployment of CAT

  • User Interface: A user-friendly interface for test takers is essential for a positive test taking experience.
  • Accessibility: Ensuring the test is accessible to individuals with disabilities is crucial.
  • Quality Assurance: Rigorous testing and validation of the CAT system are required.
  • Evaluation and Improvement: Ongoing monitoring and evaluation of test performance is essential for continuous improvement.
  • Legal and Ethical Compliance: Adhering to relevant regulations and ethical guidelines is mandatory.

So in this age of AI, it begs the question: How can AI give CAT that much needed 21st century upgrade?

With respect to item generation, generative AI can be used to create pools of high-quality, diverse item banks, tailored to specific learning objectives and difficulty levels. Machine learning models, trained on extensive datasets of previously validated items, can generate new items whilst maintaining consistency with established psychometric and other relevant properties. This AI-driven approach not only accelerates the item creation process but more importantly, ensures a continuous supply of fresh, relevant content, reducing the risk of item exposure and test security breaches.

AI significantly improves the adaptive nature of CAT by employing sophisticated algorithms that go beyond traditional IRT-based methods. These AI systems can consider multiple factors simultaneously, such as content balance, item exposure rates, and test taker characteristics, to select the most informative and appropriate items in real-time. This results in shorter, more precise assessments that provide a more accurate measure of the test taker's ability level.

And when it comes to item scoring, AI, particularly natural language processing (NLP) and machine learning techniques, enables more nuanced evaluation of complex response types, including short answer responses and essays to open-ended questions. These AI-powered scoring systems can assess not only the correctness of responses but also their coherence, creativity, and depth of understanding, providing a more comprehensive evaluation of the test taker's abilities.

More recently, AI has been pushing the boundaries of Item Response Theory itself. Advanced machine learning models can uncover complex, non-linear relationships between item responses and latent traits, leading to more sophisticated and accurate psychometric models. These AI-enhanced IRT models can adapt to changing patterns in test taker behaviours and item characteristics over time, ensuring that the underlying theoretical framework remains robust and relevant.

Additionally, AI can facilitate the integration of new types of data, such as response times and interaction patterns, into IRT models, providing a more holistic view of the test taker's performance. By continuously learning from testing data, AI systems can also detect and mitigate potential biases in item functioning across different demographic groups, enhancing the fairness and validity of assessments. Ultimately, the integration of AI into computerised adaptive testing is not just an incremental improvement but a transformative upgrade of a traditional assessment form that  deserves more uptake, especially at the high-stakes level, to offer more personalised, efficient, and insightful knowledge assessments.

Read more

Future of Assessment with AI

This blog explores how AI marking is set to transform education by providing instant feedback, reducing teacher workload, and shifting assessments from high-stakes exams to continuous, growth-focused evaluations.

AI in Assessments: Breaking Down Misconceptions with sAInaptic

This blog demystifies common misconceptions about AI in assessments, highlighting how AI supports teachers by improving marking efficiency and consistency, while addressing concerns about bias, accuracy, and compliance with regulations.

Exploring the Potential: Can AI Effectively Mark Students' Work?

This blog demystifies common misconceptions about AI in assessments, explaining how AI supports rather than replaces teachers, ensures accuracy and fairness, and complies with regulations like Ofqual, enhancing both the efficiency and quality of the assessment process.