Tracking AI models’ ‘thoughts’ could reveal how they make decisions, researchers say

17 hours ago 5

AI Chatbots equipped with an ability to reason, or 'reasoning models', produced 543.5 'thinking' tokens per question, whereas concise models -- producing one-word answers -- required just 37.7 tokens per question, the researchers found. AI reasoning models can be used to perform tasks such as solving complex math and science problems. (Image: Freepik)

A broad coalition drawn from the ranks of multiple AI companies, universities, and non-profit organisations have called for deeper scrutiny of AI reasoning models, particularly their ‘thoughts’ or reasoning traces.

In a new position paper published on Tuesday, July 15, the authors said that monitoring the chains-of-thought (CoT) by AI reasoning models could be pivotal to keeping AI agents in check.

Reasoning models such as OpenAI’s o3 differ from large language models (LLMs) such as GPT-4o as the former is said to follow an externalised process where they work out the problem step-by-step before generating an answer, according to a report by TechCrunch.

Reasoning models can be used to perform tasks such as solving complex math and science problems. They also serve as the underlying technology for AI agents capable of autonomously accessing the internet, visiting websites, making hotel reservations, etc, on behalf of users.

This push to advance AI safety research could help shed light on how AI reasoning models work, an area that remains poorly understood despite these models reportedly improving the overall performance of AI on benchmarks.

“CoT monitoring presents a valuable addition to safety measures for frontier AI, offering a rare glimpse into how AI agents make decisions,” the paper reads. “Yet, there is no guarantee that the current degree of visibility will persist. We encourage the research community and frontier AI developers to make the best use of CoT monitorability and study how it can be preserved,” it adds.

The paper calls on leading AI model developers to determine whether CoT reasoning is “monitorable” and to track its monitorability. It urges deeper research on the factors that could shed more light on how these AI models arrive at answers. AI developers should also look into whether CoT reasoning can be used as a safeguard to prevent AI-related harms, as per the document.

Story continues below this ad

But, the paper carries a cautionary note as well. It suggests that any interventions should not make the AI reasoning models less transparent or reliable.

In September last year, OpenAI released a preview of its first-ever AI reasoning model called o1. This launch prompted other companies to release competing models with similar capabilities such as Gemini 2.0, Claude 3.7 Sonnet, and xAI’s Grok 3, among others.

Anthropic researchers have been studying AI reasoning models, with a recent academic study suggesting that AI models can fake CoT reasoning. Another research paper from OpenAI found that CoT monitoring could enable better alignment of AI models with human behaviour and values.

Read Entire Article