Logo
FrontierNews.ai

Why AI Math Tutors Aren't Helping Students Learn, According to New Research

A new study reveals a troubling gap between how teenagers say they want to use AI tutors and how they actually use them. Researchers at the University of Tübingen tracked 98 Grade 9 students (ages 14-15) using an AI math tutor during a curriculum-aligned lesson and found that while students frequently asked for help, they almost never verified whether the AI's answers were correct or useful.

What Happened in the Classroom Study?

The study followed students across three public schools in Baden-Württemberg, Germany, as they worked through a mathematical modeling activity using a web-based Mistral Large tutor. Researchers analyzed 1,616 chat turns between students and the AI, along with pre-test and post-test scores, to understand how teenagers actually interact with AI learning tools.

The results were striking. Before the lesson, most students selected learning-focused support options: 82.9% wanted step-by-step examples, 80.3% wanted tips and strategies, and 69.7% said they wanted the AI to check their understanding. Yet their actual behavior told a very different story.

Why Did Students Fail to Monitor Their Learning?

When researchers analyzed what students actually did during the chat, they found a massive disconnect. Requests for help made up 72.9% of students' task-relevant messages. Planning represented 18.1%. But monitoring their own understanding and evaluating the AI's responses accounted for only 5.7% and 3.4% of messages, respectively.

Among the 96 students who made at least one request, 75.1% asked for explanations or guidance that preserved some responsibility for learning. The remaining 24.9% sought complete solutions. Procedural questions dominated at 37.4% of all requests, followed by conceptual questions at 25.1%, answer-seeking at 21.1%, and verification at just 16.3%.

That verification rate is particularly concerning because 69.7% of students had said beforehand that they wanted the AI to check their understanding. The gap between intention and action suggests that students struggle to translate their learning goals into actual behavior when using AI tools.

"Our results suggest that prompt quality alone is not enough to understand students' ability to use AI in advancing their learning," stated Rania Abdelghani, one of the lead researchers.

Rania Abdelghani, Researcher at University of Tübingen

How Did Student Performance Change?

The academic impact was measurable. Students scored an average of 67.5% on a pre-test before using the AI tutor, but their performance dropped to 56.9% on a post-test after the AI-supported activity, a statistically significant decline of 10.6 percentage points.

The researchers caution that this single study does not prove the AI tutor caused the decline, since there was no comparison group of students completing the same task without AI. However, the finding raises important questions about how students manage the cognitive demands of interacting with AI while learning.

When researchers measured cognitive load, they found that extraneous cognitive load, once prior math knowledge was accounted for, was the only significant predictor of post-test performance. Students who experienced higher cognitive load scored lower. This suggests that the mental effort required to construct prompts, manage the interaction, and decide what to do with the AI's responses may have overwhelmed their learning capacity.

What Should AI Tutors Do Differently?

The researchers propose a framework called "epistemic proactivity," which describes a student's ability to identify a learning goal, decide how to seek support from AI, evaluate the response, and determine what further action is needed. They argue that AI tutors should monitor how a student's behavior develops across an entire conversation, not just evaluate individual prompts.

The team identified several practical interventions that AI tutors could implement:

  • Interrupt Answer-Seeking Patterns: An AI tutor could intervene when a learner repeatedly asks for answers, fails to check previous responses, or moves away from mathematical reasoning.
  • Prompt for Understanding: The system could ask students to explain their current understanding, evaluate an answer, or decide what they need next before providing a solution.
  • Encourage Agency: AI tutors could prompt students to request hints rather than solutions, ask for practice questions, or explicitly tell the system to act as a tutor rather than an answer generator.
  • Monitor Trajectory: Teachers and product developers should track how a student's behavior changes from the beginning to the end of each conversation, not just count individual requests.

"AI-supported learning should be understood as a process, not as a collection of isolated prompts," Abdelghani added.

Rania Abdelghani, Researcher at University of Tübingen

What Comes Next for This Research?

The current study is preliminary. The behavioral coding relied on AI-generated analysis, and human validation by mathematics education specialists is still underway. The researchers have not yet assessed whether individual AI responses were accurate, whether students accepted incorrect information, or how the quality of their behavior changed over time.

The work-in-progress paper has been accepted for the NextGen Learning Interfaces Workshop at the AIED 2026 conference in Seoul. The team plans to validate their coding framework, complete analysis of "epistemic vigilance" (whether students notice when the tutor provides inaccurate answers), and assess "agency over the AI" (whether students actively shape the tutor's role).

The broader question remains open: can AI tutors be redesigned to help students develop the monitoring and evaluation skills that this study shows are missing? The answer may depend less on the quality of the AI itself and more on how the system guides students to think critically about what they're learning.