Logo
FrontierNews.ai

How AI Models Lose Their Way Across Languages: A New Fix for Multilingual Reasoning

A new research method called SOLAR helps multilingual AI models maintain consistent reasoning across different languages, addressing a fundamental problem where semantically equivalent questions produce different answers depending on whether they're asked in English, Chinese, or other languages. The technique, published on June 26, 2026, improves accuracy by up to 17.7 percentage points over baseline models and 3.8 points over standard training methods on multilingual reasoning benchmarks.

Why Do AI Models Reason Differently in Different Languages?

Large language models (LLMs) that serve global users face an unexpected challenge: they often produce inconsistent reasoning and answers when given semantically equivalent prompts in different languages. A person asking a math problem in English might get a correct answer, while the exact same question in Swahili or Thai yields a different result, even though the underlying logic should be identical.

The root cause lies in how these models process information. Research reveals a consistent pattern: in the early layers of the model's neural network, representations remain largely language-agnostic, meaning the model understands the core concept regardless of language. However, as the model approaches its final layers and prepares to generate output, it becomes increasingly language-specialized. The model must eventually choose specific tokens, or discrete words, in the target language's script, whether that's Latin characters, Chinese characters, or Thai glyphs. This transition from shared semantic understanding to language-specific surface forms acts like a bottleneck, collapsing the probability of semantically related words across languages into a single language-specific choice.

How Does SOLAR Keep Reasoning Aligned Across Languages?

SOLAR, which stands for Soft Token Alignment for Cross-Lingual Reasoning, takes a different approach. Instead of relying on discrete token selection, which forces the model to commit to a single language-specific word, SOLAR uses soft tokens. Soft tokens are probability-weighted mixtures over vocabulary embeddings, creating continuous representations that aggregate information from semantically related tokens across languages.

The method works during the training phase by using English as a pivot language. For each reasoning task and final response, SOLAR summarizes the model's next-token distribution as an expectation over the embedding matrix, then pools this across positions to obtain a single continuous vector independent of any individual token identity. The system then minimizes the distance between the English representation and each non-English counterpart, drawing all languages toward an English-anchored semantic space. Because soft tokens operate in a shared embedding space, they enable direct comparison across languages regardless of surface-form differences.

What Results Did Researchers Achieve?

Testing on the Qwen3 model family revealed substantial improvements. On the AIME 2024 benchmark, SOLAR improved accuracy by up to 3.8 percentage points over standard supervised fine-tuning and cross-lingual consistency by up to 4.5 points. The largest gains appeared on low-resource languages such as Swahili, where models typically struggle due to limited training data.

Representation analysis showed that soft tokens carry a stronger final-layer cross-lingual signal than discrete tokens on base models. SOLAR produced substantially stronger final-layer cross-lingual alignment than both standard supervised fine-tuning and inference-time soft thinking alone. Importantly, these gains preserved native-language reasoning: on the MGSM benchmark, SOLAR-tuned Qwen3-4B reasoned in the target language's script at 98.13 percent accuracy, compared to 98.81 percent for the base model. For the larger 8B model, SOLAR-tuned performance reached 98.16 percent, reversing the base model's collapse to English on non-Chinese languages, where performance on Japanese, Thai, and Telugu had fallen below 5 percent.

Steps to Understand How SOLAR Improves Multilingual AI

  • Discrete Thinking Problem: Traditional chain-of-thought reasoning generates discrete tokens one at a time, forcing the model to commit to language-specific words that can cause semantically equivalent reasoning paths to diverge across languages.
  • Soft Token Solution: SOLAR replaces hard discrete projections with continuous representations formed from probability-weighted mixtures over vocabulary embeddings, allowing information to flow across semantically related tokens in different languages.
  • Cross-Lingual Alignment Training: During supervised fine-tuning, SOLAR aligns soft-token representations between English and non-English parallel reasoning traces by minimizing cosine distance in a shared embedding space, keeping all languages anchored to consistent semantic understanding.
  • Benchmark Validation: The method was tested across four multilingual reasoning benchmarks, showing accuracy improvements of up to 17.7 points over base models and 3.8 points over standard fine-tuning, with particularly strong gains on low-resource languages.

What Makes This Approach Novel?

SOLAR represents the first method to leverage soft tokens, originally introduced for inference-time continuous reasoning, in a cross-lingual alignment objective during post-training. Prior soft-thinking methods used soft tokens mainly as an inference-time generation mechanism, softening discrete decoding decisions but not directly encouraging semantically equivalent reasoning paths in different languages to stay aligned. SOLAR closes this gap by using soft-token representations as a training-time signal for cross-lingual alignment.

The practical implications are significant for organizations deploying AI systems globally. Inconsistent reasoning across languages can undermine trust in AI systems, particularly in high-stakes domains like education, healthcare, and customer support. By ensuring that multilingual models produce consistent answers to equivalent questions regardless of language, SOLAR helps make AI systems more reliable and fair for non-English-speaking users, who represent the majority of the global population.