In the Paper BrandedUp Watch Hello! Create with us Privacy Policy

Neurosurgery residents score higher than ChatGPT on board-style exam questions—study

Published Oct 16, 2025 5:25 pm

AI tools may be smart, but it looks like human professionals are still on top.

In a recent study published under the Neurosurgical Review, University of the Philippines Manila researchers attempted to test performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.

They searched published studies between November 2022 (when ChatGPT was launched) and October 25, 2024, using the PRISMA guideline.

PRISMA, which stands for Preferred Reporting Items for Systematic reviews and Meta-Analyses, is a set of standards designed to help researchers clearly explain what was done and what was found.

Two reviewers carefully selected studies that tested ChatGPT on the said questions and directly compared its answers to those of neurosurgery residents.

After screening, six studies were selected for qualitative and quantitative analysis.

Researchers discovered that the accuracy of ChatGPT ranged from 50.4 to 78.8%, while the residents' scores were 58.3 to 73.7%.

According to them, there was an "overall trend favoring neurosurgery residents versus ChatGPT."

"Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity," the researchers stated.

They noted, however, that "further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education."

ChatGPT is an AI language model developed by OpenAI, designed to understand and generate human-like text based on the input it receives.

It can be compared to a very advanced chatbot that can answer questions, write essays, summarize text, help with coding, and more.