Rethinking Reasoning: AI’s Next Frontier

Why Reasoning Changes Everything

Apr 01, 2025

As we close out the first quarter of 2025, we’re already seeing a fascinating shift in focus. New methods from training algorithms, using the <thinking> tag, or simply giving a model time to *think* before answering, are here to stay. Reasoning capabilities are among the defining frontiers in generative AI, with major players and open-source communities pushing boundaries in how models think, act, and interact. We’re not going back.

Can you believe that it wasn’t all that long ago that we were saying “AI can do amazing things, but we have no idea how it arrives at the answer we get.” Those days already seem long ago now, especially with Anthropic’s research sharing recent revelations about what happens inside the black box, and this change is one that I particularly enjoy. Why? Because I think showing how the model arrives at an answer is powerful and instructive, and it’s exciting that we’re making progress in reaching explainability. However, I can also admit that there’s far more to learn!

Over the last few weeks, we’ve seen service after service reveal more about what’s happening with the model as the responses are generated. These ‘thinking models’ and circuit tracing approaches are fun and bear more time and attention. It’s been a hectic month of model updates, new features, and incredible new capabilities. I’ve hardly even had time to put down my use of Deep Research across Gemini, OpenAI, and Perplexity, and these same providers have all released new features for image generation (Google & OpenAI, I) haven’t been able to put it down. There are no excuses for not trying these services; many are entirely free! (Google is making everything accessible in AI Studio, and OpenAI has already shared plans to make the new GPT-4o and image generation available at no cost (though limitations apply, you have at least enough to experiment and make AI MISTAKES!

This is a fundamental shift. Until recently, AI models have been black boxes, and they’ve been held beyond payment limitations. So far in 2025, we now have a way to not only understand how the model came up with the answer, but can also use that to break down a problem, learn from it, and even correct the approach in future prompts. It’s incredible what’s already possible in 2025!

The significance of this cannot be overstated. AI has long been about prediction, but now it’s about understanding, explanation, and exploration.

Notable Headlines

Anthropic

Claude 3.7: Claude 3.7 improves performance in coding, math, and reasoning tasks, while reducing hallucinations and increasing speed.
MCP: The Model Context Protocol (MCP) is Anthropic's open protocol for model-to-model and tool-to-model communication, designed for future agentic systems.

Google

Gemini 2.0 Flash: A lightweight, efficient version of Google's Gemini 2.0 model, optimized for cost-effective performance and fast response times.
Gemini 2.5: The latest iteration of the Gemini model family, showcasing advancements in reasoning and multi-modal performance.
Gemma 3: An open model developed by Google DeepMind as part of their push for responsible open-source AI.
TxGemma: An open-source variant of Gemma fine-tuned for biomedical and therapeutics-related tasks.

OpenAI

o3-mini: A compact and efficient model optimized for low-latency tasks and edge deployments.
GPT 4.5: A transitional model between GPT-4 and GPT-5, offering improved speed and reasoning capabilities.
GPT-4o: An omni-modal model that processes text, image, and audio simultaneously for richer contextual understanding.
GPT-4o-transcribe: A next-gen audio model capable of near real-time transcription and translation with high accuracy.

Microsoft

Think Deeper in Copilot: Introduces the "Think Deeper" capability in Microsoft 365 Copilot using OpenAI’s o3-mini model for enhanced reasoning, faster performance, and visual enhancements.
Copilot Release Notes - March 2025: Details new features, including Copilot availability on GroupMe and Viber, and product price tracking.
Copilot Vision Preview: Announces the preview release of Copilot Vision, enabling camera-based real-time assistance, especially for Android users in the U.S.

Perplexity

Deep Research: A new feature that enhances Perplexity’s research capabilities by surfacing citations and deeper source validation.
Cost Optimization of Sonar: An improvement to the Sonar model, offering better performance at a lower cost than some OpenAI offerings.

Learning Tools and Experiments

OpenAI Responses API + Agents SDK
https://platform.openai.com/docs/guides/agents-sdk

Cloudflare Agents
https://developers.cloudflare.com/agents/

OWL: built on CAMEL-AI Framwork, improve your agent interactions https://github.com/camel-ai/owl

MCP servers
https://github.com/modelcontextprotocol/servers

Mistral OCR
https://mistral.ai/fr/news/mistral-ocr

AI MISTAKES: Learning from the Past, Preparing for the Future

In the introduction to this month’s newsletter, we’ve talked about reasoning and shared the paper that Anthropic recently published, showing off the experiments that they conducted to better understand what’s happening inside the model.

It’s now increasingly clear that learning from mistakes will be a part of either the model-building or the mechanisms that control how a model answers questions.

The reality is that LLMs are trained through a multi-stage process that allows them to “learn” from mistakes and improve their responses. What we’ve historically done during the training process is drive improvements. These are not in real-time during a conversation but instead by using techniques such as self-supervised pretraining and reinforcement learning from human feedback (RLHF).

One of the significant research breakthroughs coming from many organizations trying out different techniques is that we can learn from what works and what doesn’t work in how we train (and use the models). The people behind DeepSeek have shared that they got great results just by taking the RL and skipping the HF.

Reinforcement Learning without Human Feedback (RL)

The core innovation was to replace the human feedback mechanism with rule-based rewards. This means that the model training relied on a set of automated trial and error optimizations. This took far less time but still led to good results.

By replacing the human feedback mechanism, they found that it could be a fast responding mechanism that took far less effort and time to train. The trade-off, however, was having to develop an algorithm that dealt with logical coherence, task completeness, and output fluency.

The best part of the outcomes from this experimentation was the discovery of yet more so-called “emergent properties” like the complex reasoning behaviors (chain-of-thought and self-correction) that I discussed in the introduction. Still, they were not explicitly asked for… it happened as a byproduct of moving to the new method.

It’s still necessary to leverage human feedback and give guidance and instruction to the model. Still, it happens at different times in the development cycle. It allows a dramatically accelerated schedule and better results without requiring substantial annotation teams to instruct the model development.

I can’t help but think that as we continue to advance AI models, we will make them ever-so-much more like us. This can only mean one thing: mistakes are here to stay, so we might as well get started.

Now make some mistakes by experimenting and learning with AI!

Rethinking Reasoning: AI’s Next Frontier

Why Reasoning Changes Everything

Notable Headlines

Learning Tools and Experiments

AI MISTAKES: Learning from the Past, Preparing for the Future

Discussion about this post