Developers describe Grok 4.1 as the most significant leap forward in the history of the model line. The new version has become noticeably more natural and human-like in communication: it better captures emotional nuance, builds meaningful dialogue, and offers compelling creative solutions, while retaining the strong analytical accuracy that distinguished previous versions.

An example of Grok 4.1 responding to the prompt “Generate a photo of a Kazinform News Agency correspondent.”

Photo credit: Grok

Testing results show that in blind comparison trials, Grok 4.1 was preferred 64.78% of the time, and in the international LMArena Text Arena ranking, Grok 4.1 Thinking took first place with 1483 Elo, outperforming its closest competitor by 31 points.

One of the key breakthroughs is a sharp reduction in hallucinations — factual errors in responses. According to xAI, the rate of inaccuracies has dropped more than threefold, from 12.09% to 4.22% based on evaluations using real-world production queries.

The model also demonstrated strong performance in emotional intelligence testing (EQ-Bench), scoring 1586 Elo, and showed top-tier results in the Creative Writing v3 benchmark, ranking just below the early experimental Polaris Alpha (GPT 5.1).

Ahead of the public release, Grok 4.1 underwent a silent two-week rollout on X and mobile apps: from November 1 to 14, the model was tested on live traffic in blind comparison mode. The company emphasizes that the ultimate goal is to create an AI capable not only of accurate analysis but also of meaningful emotional interaction, bringing machine dialogue closer to a human conversation.

