Google releases Gemini 2.5 as model competition continues

Google’s Gemini 2.5 launches with benchmark gains in reasoning, science, and code tasks

Google releases Gemini 2.5 as model competition continues
Gemini 2.5 Pro preview

On March 25, Google introduced Gemini 2.5 — the latest version of its large language model. The system is described as a “thinking” model, built to reason through prompts instead of just predicting likely responses.

Its release adds another entry into the growing field of high-capacity models, where OpenAI, Anthropic, and Mistral are also shipping frequent updates. Performance differences are narrowing, while workflows, pricing, and usability shape day-to-day adoption.

Claimed improvements in science and coding

Gemini 2.5 Pro Experimental is claimed to outperform previous versions on science, math, and reasoning benchmarks. It topped LMArena, a benchmark based on human evaluations, and performed well in domain-specific tasks such as AIME, GPQA, and SWE-Bench Verified.

On SWE-Bench, a coding benchmark using agent-based problem solving, Gemini 2.5 scored 63.8% — a noticeable increase over what was previously reported for Gemini 2.0. Google also claimed that the model avoided test-time techniques like majority voting, instead embedding reasoning into the model’s core.

Performance comparison: how Gemini 2.5 stacks up

Gemini 2.5: Comparison table

Compared to earlier models and competing systems, Gemini 2.5 shows claimed gains across several benchmarks:

  • Reasoning: On Humanity’s Last Exam, Gemini 2.5 scored 18.8%, compared to GPT-4.5’s 6.4% and Claude 3.7’s 8.9%.
  • Science: On the GPQA benchmark, it scored 84.0%, edging past GPT-4.5 (71.4%) and landing close to Claude 3.7 and Grok 3.
  • Mathematics: On AIME 2024, Gemini 2.5 scored 92.0% — higher than other listed models. For AIME 2025, the score was 86.7%.
  • Agentic coding: On SWE-Bench Verified, the model scored 63.8%, ahead of GPT-4.5 (49.3%) and Claude 3.7 (70.3%).
  • Code editing and generation: In Aider Polyglot and LiveCodeBench tasks, Gemini 2.5 delivered improvements in both accuracy and formatting compared to older runs.
  • Context length: Long context performance was also claimed to improve, with 94.5% accuracy on MRCR (Multi Round Coreference Resolution), a test of how models retain references over long text windows.

These results, while benchmarked, don’t capture deployment friction, latency, or downstream integration — factors that typically shape real adoption.

Extended context and multimodal inputs

Gemini 2.5 supports up to 1 million tokens per prompt, with a 2 million token window in progress. It can process multiple input formats — including text, audio, video, images, and full code repositories.

It’s available through Google AI Studio and the Gemini app. Rollout to Vertex AI is expected next.

OpenAI maintains traction in deployment and tools

While Gemini 2.5 shows technical gains, OpenAI’s GPT-4.5 remains widely used for daily workflows. Its API integrations, assistant infrastructure, and model tuning options continue to influence team decisions.

OpenAI has leaned into tool support, while other models — including Claude 3.7 and Grok 3 — are developing similar ecosystems to stay competitive.

Enterprise teams now weigh use cases over model scores

Gemini 2.5 adds another strong contender for teams dealing with scientific, coding-heavy, or long-context tasks. But benchmarks only tell part of the story.

The decision often comes down to context: integration speed, pricing structure, latency, and how well a model fits existing workflows. With more updates coming from Anthropic and open-weight players like Mistral, the field remains fluid.

Right now, the AI race is less about winners and more about fit.

Source: 

Google DeepMind. (2025, March 25). Gemini 2.5: Our most intelligent AI model. https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/

💡
This post is created by ContentGrow, providing scalable and tailored content creation services for B2B brands and publishers worldwide. Book a discovery call to learn more.
Book a call with ContentGrow (for brands & publishers) - ContentGrow
Thanks for booking a call with ContentGrow, a managed talent network of freelance media professionals ready to serve brands, publishers, and global content teams.Let’s chat a bit about your content needs and see if ContentGrow is the right solution for you!IMPORTANT: To confirm a meeting, we need