The competition among leading artificial intelligence companies is becoming increasingly intense as new benchmark results show rapid improvements in AI reasoning performance.
Data from TrackingAI’s April 2026 Mensa Norway benchmark reveals that several frontier AI systems are now scoring at or above “genius-level” IQ benchmarks, highlighting how quickly advanced AI models are evolving.
The latest rankings place xAI’s Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) at the top of the leaderboard, each achieving a score of 145.
The results were published as part of Visual Capitalist’s AI Week 2026 report, which analyzed leading AI models using the public Mensa Norway IQ test.
Grok and OpenAI Share the Top Position in AI IQ Rankings
The benchmark results show an increasingly crowded race among the world’s top AI developers. Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) tied for first place with IQ scores of 145.
Google’s Gemini 3.1 Pro Preview followed closely behind with a score of 141, while OpenAI GPT 5.4 Thinking (Vision) reached 139. OpenAI GPT 5.3 ranked next with a score of 136.
Several other leading models also surpassed the “genius-level” benchmark of 130. Grok-4.20 Expert Mode (Vision), OpenAI GPT 5.4 Thinking, and Meta Muse Spark each scored 133. Gemini 3.1 Pro Preview (Vision) achieved 132, while Qwen 3.5 and Claude-4.6 Opus both recorded scores of 130.
Other models ranked lower in the dataset, including Kimi K2.5 with 127, Manus with 115, DeepSeek R1 with 112, and DeepSeek V3 with 111.
Among major AI developers, Mistral’s top model ranked among the lowest performers in the benchmark with a score of 97.
AI Benchmark Scores Rise Sharply From 2025
The latest benchmark results also show how quickly frontier AI systems are improving. According to TrackingAI’s data, the highest score recorded in 2025 was 135. In 2026, the top score increased to 145.
The narrow score differences between top models suggest that leading AI companies are rapidly converging in reasoning performance. A gap of only a few points now separates many of the industry’s most advanced systems.
Visual Capitalist noted that the top tier has become increasingly compressed, with multiple frontier models clustered closely together on the leaderboard.
How TrackingAI Measures AI Intelligence
TrackingAI uses the public Mensa Norway IQ test, which contains 35 visual pattern-recognition puzzles.
Vision models receive the original puzzle images directly, while non-vision models answer verbalized text versions of the same questions.
Because the test is fundamentally visual, results can vary depending on how the questions are presented to the model.
TrackingAI states that the rankings should be viewed as benchmark comparisons rather than definitive measurements of overall intelligence.
The organization also noted that if a model refuses to answer a question, the same question may be repeated up to 10 times before the final answer is used for scoring.
The benchmark focuses primarily on abstract reasoning and pattern recognition. It does not evaluate broader AI capabilities such as coding performance, factual reliability, professional expertise, or tool usage.
New Tests Suggest Different AI Models Excel in Different Areas
Separate testing from OmniCalculator suggests that benchmark intelligence may not fully reflect how users experience AI systems in daily use.
The report stated that xAI’s Grok 4.2 demonstrated strong performance in logic and mathematical problem-solving tasks.
According to the findings, Grok 4.2 reduced instability during complex reasoning tasks to 33.1%, significantly lower than older AI systems that reportedly revised or second-guessed their answers around 60% of the time.
However, OmniCalculator noted that Claude 4.6 continues to stand out in writing quality, tone consistency, and handling long documents.
The report stated that Claude’s communication style often feels more measured because the model is more willing to acknowledge uncertainty in its responses.
The findings suggest that AI intelligence is becoming increasingly specialized. Some models perform better in reasoning and mathematics, while others excel in natural language generation, coherence, and conversational tone.
Industry observers also noted that no single AI system dominates every category. The best-performing model can vary depending on the task, whether it involves coding, logical reasoning, writing, or technical problem-solving.
This article is a summary of several original articles. The full versions can be read at the following links:
https://www.visualcapitalist.com/ranked-the-smartest-ai-models-in-2026/
https://trackingai.org/home?utm_source=chatgpt.com
https://www.techradar.com/ai-platforms-assistants/everyones-switching-from-chatgpt-to-claude-but-new-tests-say-neither-is-the-smartest-free-ai-and-the-real-winner-might-surprise-you
PHOTO: UNSPLASH
This article was created with AI assistance.
We make every effort to ensure the accuracy of our content, some information may be incorrect or outdated. Please let us know of any corrections at [email protected]
Read More

Wednesday, 13-05-26
