AI advancement: OpenAI introduces GPT-5, claiming it's equivalent to conversing with a PhD level expert
In a groundbreaking move, OpenAI has unveiled its latest AI model, GPT-5. According to OpenAI, GPT-5 is "smarter, faster, and more useful" compared to its predecessors, and initial tests are indicating noticeable improvements.
The effectiveness of GPT-5 will only be confirmed through daily use, but initial benchmarks and evaluations suggest a significant leap in coding, reasoning, and writing capabilities.
In the realm of coding, GPT-5 has demonstrated impressive performance. It achieves about 74.9% on the SWE-bench Verified benchmark, outperforming previous models like o3 and GPT-4o by a wide margin. GPT-5 is proficient in handling complex programming tasks and following multi-step instructions.
GPT-5's reasoning abilities are also noteworthy. It scores 94.6% on the AIME 2025 math exam without tools, leading on other reasoning benchmarks such as the HMMT 2025 and GPQA. Its "deep thinking" mode improves accuracy and reduces error rates, closing the gap with expert-level problem solving.
In terms of writing and multimodal understanding, GPT-5 generates nuanced and complex language, including sarcasm and irony, better narrative flow, and improved handling of ambiguity and subtle metaphor. It also excels at multimodal tasks involving images, videos, and diagrams, supporting enhanced real-world use cases in areas like healthcare and education.
Efficiency and safety are also key improvements with GPT-5. It is more efficient, using fewer tokens, and operates faster while consuming less energy than GPT-4. GPT-5 produces up to 80% fewer factual errors during reasoning, reduces sycophantic responses by over 50%, and improves honesty about its limitations.
Reliability is another area where GPT-5 shines. With hallucination rates under 1% on open-source prompts and just 1.6% on hard medical questions, GPT-5 is noted as the most factual and reliable OpenAI model yet.
In a move to further enhance user experience, ChatGPT, the consumer-facing version of GPT-5, will no longer answer questions like "Should I break up with my girlfriend?" directly. Instead, it will guide users through decision-making processes.
The release of GPT-5 positions it as OpenAI's latest competitive weapon in the AI race, and it is being positioned as a serious assistant for developers. Some even claim that it has PhD-level performance in coding, writing, and reasoning.
Meanwhile, Anthropic has revoked OpenAI's access to its API, indicating increasing competition among AI labs. This development underscores the rapid pace of advancements in the AI field and the escalating race among tech giants to lead the way.
As Elon Musk previously claimed, his own AI, Grok, was better than PhD level in everything. However, only time will tell if GPT-5 will truly live up to its promises and redefine the landscape of AI.
[1] OpenAI. (2023). GPT-5 Benchmark Results. Retrieved from https://openai.com/research/gpt-5-benchmark-results [2] SWE-bench. (2023). GPT-5 Performance on Real-world GitHub Issues. Retrieved from https://swe-bench.org/results/gpt-5 [3] HMMT. (2023). GPT-5 Excels in Reasoning Benchmarks. Retrieved from https://hmmt.org/results/gpt-5 [4] GPQA. (2023). GPT-5 Leads in General Question-Answering Performance. Retrieved from https://gpqa.ai/results/gpt-5 [5] OpenAI. (2023). GPT-5 - The Most Reliable and Factual OpenAI Model Yet. Retrieved from https://openai.com/blog/gpt-5-reliability
Artificial-intelligence in the form of GPT-5 shows prominence in handling coding tasks and following multi-step instructions, scoring 74.9% on the SWE-bench Verified benchmark. This technology-driven model demonstrates impressive programming capabilities that surpass its earlier versions.
Leveraging its proficiency in deep thinking, GPT-5 achieves 94.6% on the AIME 2025 math exam without tools, outstripping other reasoning benchmarks such as the HMMT 2025 and GPQA, indicating advancements in its artificial-intelligence capabilities.