Recently, I systematically tested the coding abilities of various models on a platform like ChatGOT that combines multiple AIs.
Here are my test results:
Claude 4 Sonnet is clearly leading in agent capabilities. When given clear requirements, it can write documentation, create tests, and write code.
It can also debug, search for documentation, and correct code, continuously working for hours until all tests pass. In this aspect, Claude is unbeatable; GPT and the entire Gemini lineup cannot compete.
When it comes to data structures and algorithms—essentially “hard intelligence”—the O3 series excels the most. Only O3 consistently provides optimal algorithms and architectural solutions.
Claude often comes up with suboptimal algorithms. However, while it has fewer high-level errors, it tends to have a lot of low-level mistakes.
Gemini’s hard intelligence is slightly weaker than O3 but considerably stronger than Claude 4. In terms of following instructions, Gemini is slightly better than GPT, and while it may be the most powerful overall, it is rarely utilized to its full potential.
Most of the time, I use O3 as a professional architect and algorithm engineer to tackle the most challenging parts, while Claude acts as a junior programmer, gradually filling in various details and steadily working in the background.
Gemini’s strong instruction comprehension and extensive knowledge, albeit with some hallucinations, are mostly useful in conversational modes.
Top comments (0)