DEV Community

🔥Claude Opus 4 vs. Gemini 2.5 Pro vs. OpenAI o3 Coding Comparison 🚀

Shrijal Acharya on May 26, 2025

Anthropic just launched two new AI models, Claude Opus 4 and Claude Sonnet 4 (a drop-in replacement for Claude 3.7 Sonnet), which hit the market on...

Read full post

Data with Johnson • May 27

Coding comparisons are nice, how much $$$ did they cost?

Andreas Tasoulas • May 27

I think also it can distort the findings? Is Opus available for free? If not, why it is not compared to o4-mini-high, for example?

Shrijal Acharya • May 27

No, Claude Opus 4 is not a free model. You can use Sonnet 4 for free with limited prompts. I've used models that are pretty similar in the benchmarks, nothing specific for the pickings.

I could include o4-mini-high, but that's a bit low performing model in coding, especially when comparing with the best AI model for coding, Claude Opus 4.

I've used o4-mini in one of the comparison blogs, you can check it out here.

Shrijal Acharya • May 27

Not much though. The other two models are pretty cheap, it's just the Claude Opus 4 that has a slightly higher price per million input-output tokens.

Agoing Far • May 27

75$ is only slightly higher then 15$???

Гена Мороз • May 27

I do not understand. Is this the only use case why everybody tries to create games or 3D visualizations with AI? Why don't you pick the real practical examples where commercial development could be demonstrated?

Shrijal Acharya • May 27

It's pretty tough to figure out an exact real practical example to use for these testing. If I had one, I'd definitely use it.

Nevo David • May 26

Pretty insane how fast these models level up - I gotta admit, seeing AI spit out better code than me kinda stings but also fires me up to keep learning.

Shrijal Acharya • May 26

The same situation is there for many devs nowadays :)

Chandan K Sahu • May 26

true

Chandan K Sahu • May 26

true

Shrijal Acharya • May 27

🙌

Nabin Bhardwaj • May 26

Don't tell me this is built in one shot. 🤯 Are we cooked then? How is it building such thing in pure HTML/CSS/JS as I don't think there is much data like this which they have been train

Shrijal Acharya • May 27

Yes, it did it in one shot and I'm not kidding.

Гена Мороз • May 27

Agoing Far • May 27

What about testing this on hard level leet code questions? I tried sonnet 4 on one of the hard leet code questions and it got time limit exceeded. Clearly the ai's are heavily trained on web dev but when it comes to general coding id say gemini is better.

Shrijal Acharya • May 27

For this test, I decided to focus entirely on building stuff and not on algo/leetcode questions. You can check out some of my earlier comparisons where I've tested most of the models on leetcode and CodeForces questions as well.

joq qy • May 28

Still just toy factories.
If let loose in the wild with some greedy CEO that wants to save money on heavy review, the code these models produce will end up killing people en-masse.

Lara Stewart - DevOps Cloud Engineer • May 28

Don't forget, AI is just starting out and there's still going to be lots and lots of improvements in the coming years. And also the models already launched are also going to get tons of upgrades moving forward.

Shrijal Acharya • May 28

Maybe, maybe not.

Dotallio • May 26

Super comprehensive breakdown - it's wild to see how fast these coding models are leveling up! Have you tried plugging Claude Opus 4 into your actual dev workflow yet, or just for these benchmarks?

Shrijal Acharya • May 27

Yeah, the real test comes with the actual dev workflow. I have yet to use it properly in my dev workflow, but so far it's been doing fine.

Shayne Villarin • May 27

For a few more years, companies launching their new AI models to capture tech, like Microsoft owning the entire dev ecosystem, is a never-ending rat race.

Shrijal Acharya • May 27

You said it correctly, currently it seems like a race between Google, Anthropic, OpenAI, and a few others. All these new models launching every few weeks with such improvements are really out of this world to me. At least we are getting something better and better with every single release.

Shayne Villarin • May 27

I don't need any of this.

Shrijal Acharya • May 26

Folks, let me know your thoughts on this model, Claude Opus 4, in the comments. This one is wild, and do you see it being your go-to model when it comes to coding? 👀

Justin Morales • May 26

Claude Opus 4 represents a significant advancement in AI language models, particularly because of its emphasis on coding, long-term task management, and safety. Unlike traditional models, Opus 4 appears to focus heavily on supporting complex workflows and reinforcing ethical guidelines, making it suitable for professional and sensitive applications.

Shrijal Acharya • May 27

Thank you for these info. Have you tried using Opus 4 in your workflow?

Shekhar Rajput • May 26

Thank you for sharing this insightful blog

Shrijal Acharya • May 27

You’re welcome :)

Comment deleted

Shrijal Acharya • May 27

I am not so sure if I really understand what you are saying, Nathan! :)

Ler Khachatrian • May 27

o3 may have fell short on frontend beauty, however, when it comes to complex backend architectures or nuanced bug-fixes, you might be surprised to see how much better it is than other models

Shrijal Acharya • May 28

Totally. I've heard good about this model on the backend side. For the sake of this blog, I've kept more focus on core logic and implementation yk.

I've done a comparison post, not necessarily with o3, but o3-mini-high. If you'd like to take a look: dev.to/composiodev/claude-37-sonne...