DEV Community

Cover image for 🔥Claude Opus 4 vs. Gemini 2.5 Pro vs. OpenAI o3 Coding Comparison 🚀

🔥Claude Opus 4 vs. Gemini 2.5 Pro vs. OpenAI o3 Coding Comparison 🚀

Shrijal Acharya on May 26, 2025

Anthropic just launched two new AI models, Claude Opus 4 and Claude Sonnet 4 (a drop-in replacement for Claude 3.7 Sonnet), which hit the market on...
Collapse
 
data_with_drake profile image
Data with Johnson

Coding comparisons are nice, how much $$$ did they cost?

Collapse
 
andreas_tasoulas_17b7aac1 profile image
Andreas Tasoulas

I think also it can distort the findings? Is Opus available for free? If not, why it is not compared to o4-mini-high, for example?

Collapse
 
shricodev profile image
Shrijal Acharya

No, Claude Opus 4 is not a free model. You can use Sonnet 4 for free with limited prompts. I've used models that are pretty similar in the benchmarks, nothing specific for the pickings.

I could include o4-mini-high, but that's a bit low performing model in coding, especially when comparing with the best AI model for coding, Claude Opus 4.

I've used o4-mini in one of the comparison blogs, you can check it out here.

Collapse
 
shricodev profile image
Shrijal Acharya

Not much though. The other two models are pretty cheap, it's just the Claude Opus 4 that has a slightly higher price per million input-output tokens.

Collapse
 
agoing_far_0e36f3a2883463 profile image
Agoing Far

75$ is only slightly higher then 15$???

Collapse
 
__98fbb06d1e28fd profile image
Гена Мороз

I do not understand. Is this the only use case why everybody tries to create games or 3D visualizations with AI? Why don't you pick the real practical examples where commercial development could be demonstrated?

Collapse
 
shricodev profile image
Shrijal Acharya

It's pretty tough to figure out an exact real practical example to use for these testing. If I had one, I'd definitely use it.

Collapse
 
nevodavid profile image
Nevo David

Pretty insane how fast these models level up - I gotta admit, seeing AI spit out better code than me kinda stings but also fires me up to keep learning.

Collapse
 
shricodev profile image
Shrijal Acharya

The same situation is there for many devs nowadays :)

Collapse
 
chandan_ksahu_c3219ef109 profile image
Chandan K Sahu

true

Collapse
 
chandan_ksahu_c3219ef109 profile image
Chandan K Sahu

true

Collapse
 
shricodev profile image
Shrijal Acharya

🙌

Collapse
 
nabin_bd01 profile image
Nabin Bhardwaj

Don't tell me this is built in one shot. 🤯 Are we cooked then? How is it building such thing in pure HTML/CSS/JS as I don't think there is much data like this which they have been train

Collapse
 
shricodev profile image
Shrijal Acharya

Yes, it did it in one shot and I'm not kidding.

Collapse
 
__98fbb06d1e28fd profile image
Гена Мороз

I do not understand. Is this the only use case why everybody tries to create games or 3D visualizations with AI? Why don't you pick the real practical examples where commercial development could be demonstrated?

Collapse
 
agoing_far_0e36f3a2883463 profile image
Agoing Far

What about testing this on hard level leet code questions? I tried sonnet 4 on one of the hard leet code questions and it got time limit exceeded. Clearly the ai's are heavily trained on web dev but when it comes to general coding id say gemini is better.

Collapse
 
shricodev profile image
Shrijal Acharya

For this test, I decided to focus entirely on building stuff and not on algo/leetcode questions. You can check out some of my earlier comparisons where I've tested most of the models on leetcode and CodeForces questions as well.

Collapse
 
joq_qy_05c73f18b9ab781d15 profile image
joq qy

Still just toy factories.
If let loose in the wild with some greedy CEO that wants to save money on heavy review, the code these models produce will end up killing people en-masse.

Collapse
 
larastewart_engdev profile image
Lara Stewart - DevOps Cloud Engineer

Don't forget, AI is just starting out and there's still going to be lots and lots of improvements in the coming years. And also the models already launched are also going to get tons of upgrades moving forward.

Collapse
 
shricodev profile image
Shrijal Acharya

Maybe, maybe not.

Collapse
 
dotallio profile image
Dotallio

Super comprehensive breakdown - it's wild to see how fast these coding models are leveling up! Have you tried plugging Claude Opus 4 into your actual dev workflow yet, or just for these benchmarks?

Collapse
 
shricodev profile image
Shrijal Acharya

Yeah, the real test comes with the actual dev workflow. I have yet to use it properly in my dev workflow, but so far it's been doing fine.

Collapse
 
sawata_2_shayne profile image
Shayne Villarin

For a few more years, companies launching their new AI models to capture tech, like Microsoft owning the entire dev ecosystem, is a never-ending rat race.

Collapse
 
shricodev profile image
Shrijal Acharya

You said it correctly, currently it seems like a race between Google, Anthropic, OpenAI, and a few others. All these new models launching every few weeks with such improvements are really out of this world to me. At least we are getting something better and better with every single release.

Collapse
 
sawata_2_shayne profile image
Shayne Villarin

I don't need any of this.

Collapse
 
shricodev profile image
Shrijal Acharya

Folks, let me know your thoughts on this model, Claude Opus 4, in the comments. This one is wild, and do you see it being your go-to model when it comes to coding? 👀

Collapse
 
justin_morales_225c97a732 profile image
Justin Morales

Claude Opus 4 represents a significant advancement in AI language models, particularly because of its emphasis on coding, long-term task management, and safety. Unlike traditional models, Opus 4 appears to focus heavily on supporting complex workflows and reinforcing ethical guidelines, making it suitable for professional and sensitive applications.

Collapse
 
shricodev profile image
Shrijal Acharya

Thank you for these info. Have you tried using Opus 4 in your workflow?

Collapse
 
shekharrr profile image
Shekhar Rajput

Thank you for sharing this insightful blog

Collapse
 
shricodev profile image
Shrijal Acharya

You’re welcome :)

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
shricodev profile image
Shrijal Acharya

I am not so sure if I really understand what you are saying, Nathan! :)

Collapse
 
ler_khachatrian_88daaad37 profile image
Ler Khachatrian

o3 may have fell short on frontend beauty, however, when it comes to complex backend architectures or nuanced bug-fixes, you might be surprised to see how much better it is than other models

Collapse
 
shricodev profile image
Shrijal Acharya

Totally. I've heard good about this model on the backend side. For the sake of this blog, I've kept more focus on core logic and implementation yk.

I've done a comparison post, not necessarily with o3, but o3-mini-high. If you'd like to take a look: dev.to/composiodev/claude-37-sonne...