
Anthropic just launched two new AI models, Claude Opus 4 and Claude Sonnet 4 (a drop-in replacement for Claude 3.7 Sonnet), which hit the market on...
For further actions, you may consider blocking this person and/or reporting abuse
Coding comparisons are nice, how much $$$ did they cost?
I think also it can distort the findings? Is Opus available for free? If not, why it is not compared to o4-mini-high, for example?
No, Claude Opus 4 is not a free model. You can use Sonnet 4 for free with limited prompts. I've used models that are pretty similar in the benchmarks, nothing specific for the pickings.
I could include o4-mini-high, but that's a bit low performing model in coding, especially when comparing with the best AI model for coding, Claude Opus 4.
I've used o4-mini in one of the comparison blogs, you can check it out here.
Not much though. The other two models are pretty cheap, it's just the Claude Opus 4 that has a slightly higher price per million input-output tokens.
75$ is only slightly higher then 15$???
I do not understand. Is this the only use case why everybody tries to create games or 3D visualizations with AI? Why don't you pick the real practical examples where commercial development could be demonstrated?
It's pretty tough to figure out an exact real practical example to use for these testing. If I had one, I'd definitely use it.
Pretty insane how fast these models level up - I gotta admit, seeing AI spit out better code than me kinda stings but also fires me up to keep learning.
The same situation is there for many devs nowadays :)
true
true
🙌
Don't tell me this is built in one shot. 🤯 Are we cooked then? How is it building such thing in pure HTML/CSS/JS as I don't think there is much data like this which they have been train
Yes, it did it in one shot and I'm not kidding.
I do not understand. Is this the only use case why everybody tries to create games or 3D visualizations with AI? Why don't you pick the real practical examples where commercial development could be demonstrated?
What about testing this on hard level leet code questions? I tried sonnet 4 on one of the hard leet code questions and it got time limit exceeded. Clearly the ai's are heavily trained on web dev but when it comes to general coding id say gemini is better.
For this test, I decided to focus entirely on building stuff and not on algo/leetcode questions. You can check out some of my earlier comparisons where I've tested most of the models on leetcode and CodeForces questions as well.
Still just toy factories.
If let loose in the wild with some greedy CEO that wants to save money on heavy review, the code these models produce will end up killing people en-masse.
Don't forget, AI is just starting out and there's still going to be lots and lots of improvements in the coming years. And also the models already launched are also going to get tons of upgrades moving forward.
Maybe, maybe not.
Super comprehensive breakdown - it's wild to see how fast these coding models are leveling up! Have you tried plugging Claude Opus 4 into your actual dev workflow yet, or just for these benchmarks?
Yeah, the real test comes with the actual dev workflow. I have yet to use it properly in my dev workflow, but so far it's been doing fine.
For a few more years, companies launching their new AI models to capture tech, like Microsoft owning the entire dev ecosystem, is a never-ending rat race.
You said it correctly, currently it seems like a race between Google, Anthropic, OpenAI, and a few others. All these new models launching every few weeks with such improvements are really out of this world to me. At least we are getting something better and better with every single release.
I don't need any of this.
Folks, let me know your thoughts on this model, Claude Opus 4, in the comments. This one is wild, and do you see it being your go-to model when it comes to coding? 👀
Claude Opus 4 represents a significant advancement in AI language models, particularly because of its emphasis on coding, long-term task management, and safety. Unlike traditional models, Opus 4 appears to focus heavily on supporting complex workflows and reinforcing ethical guidelines, making it suitable for professional and sensitive applications.
Thank you for these info. Have you tried using Opus 4 in your workflow?
Thank you for sharing this insightful blog
You’re welcome :)
I am not so sure if I really understand what you are saying, Nathan! :)
o3 may have fell short on frontend beauty, however, when it comes to complex backend architectures or nuanced bug-fixes, you might be surprised to see how much better it is than other models
Totally. I've heard good about this model on the backend side. For the sake of this blog, I've kept more focus on core logic and implementation yk.
I've done a comparison post, not necessarily with o3, but o3-mini-high. If you'd like to take a look: dev.to/composiodev/claude-37-sonne...