Large Language Model-Based Neurosurgical Evaluation Matrix: A Novel Scoring Criteria to Assess the Efficacy of ChatGPT as an Educational Tool for Neurosurgery Board Preparation

World Neurosurg. 2023 Dec:180:e765-e773. doi: 10.1016/j.wneu.2023.10.043. Epub 2023 Oct 14.

Abstract

Introduction: Technological advancements are reshaping medical education, with digital tools becoming essential in all levels of training. Amidst this transformation, the study explores the potential of ChatGPT, an artificial intelligence model by OpenAI, in enhancing neurosurgical board education. The focus extends beyond technology adoption to its effective utilization, with ChatGPT's proficiency evaluated against practice questions from the Primary Neurosurgery Written Board Exam.

Methods: Using the Congress of Neurologic Surgeons (CNS) Self-Assessment Neurosurgery (SANS) Exam Board Review Prep questions, we conducted 3 rounds of analysis with ChatGPT. We developed a novel ChatGPT Neurosurgical Evaluation Matrix (CNEM) to assess the output quality, accuracy, concordance, and clarity of ChatGPT's answers.

Results: ChatGPT achieved spot-on accuracy for 66.7% of prompted questions, 59.4% of unprompted questions, and 63.9% of unprompted questions with a leading phrase. Stratified by topic, accuracy ranged from 50.0% (Vascular) to 78.8% (Neuropathology). In comparison to SANS explanations, ChatGPT output was considered better in 19.1% of questions, equal in 51.6%, and worse in 29.3%. Concordance analysis showed that 95.5% of unprompted ChatGPT outputs and 97.4% of unprompted outputs with a leading phrase were aligned.

Conclusions: Our study evaluated the performance of ChatGPT in neurosurgical board education by assessing its accuracy, clarity, and concordance. The findings highlight the potential and challenges of integrating AI technologies like ChatGPT into medical and neurosurgical board education. Further research is needed to refine these tools and optimize their performance for enhanced medical education and patient care.

Keywords: AI evaluation matrix; Artificial intelligence; ChatGPT; Medical education technology; Neurosurgical education.

MeSH terms

  • Artificial Intelligence
  • Educational Status
  • Humans
  • Language
  • Neurosurgery*
  • Neurosurgical Procedures