The academic community is taking notice when OpenAI’s most recent AI model, “ChatGPT o3,” scored almost flawlessly on the infamously difficult JEE Advanced 2025 exam. The most competitive university entrance exam in India is the Joint Entrance Examination (JEE) Advanced, which serves as a route to the esteemed Indian Institutes of Technology (IITs).
The pioneering experiment was led by Anushka Aashvi, an engineer from IIT Kharagpur, who started it as a side project. But the outcomes were anything but typical. With an incredible score of 327 out of 360, ChatGPT O3 would have received an All India Rank (AIR) of 4 in the real test.
Aashvi painstakingly replicated authentic testing scenarios for the AI.
Each question was asked in a new chat session with no corrections or hints given in order to remove any memory bias.
Even with these strong limitations, ChatGPT o3 showed impressive performance. With only a few marks lost in Physics and earlier parts, the AI notably earned perfect 60s in both Chemistry and Mathematics during the second round of the simulated exam.
This exceptional performance by an AI chatbot on such a demanding, human-centered test highlights the quickly developing potential of AI and raises important questions regarding its possible effects on competitive testing, education, and the definition of “intelligence” itself.
In the meantime, a different study conducted by Apple experts clarifies the shortcomings of well-known AI programs like DeepSeek, Claude, and ChatGPT o3. Even though these models generate confident, well-spoken answers, they frequently struggle when faced with very challenging problems.
Apple’s team contends in a recently released research paper titled The Illusion of Thinking that even the most sophisticated language models available today may not use actual reasoning as is often believed. According to their research, while these models are capable of accurately simulating intelligence, they frequently falter when faced with really difficult problems.












