Introduction: Patients with diabetes require healthcare and information that are accurate and extensive. Large language models (LLMs) like ChatGPT herald the capacity to provide such exhaustive data. To determine (a) the comprehensiveness of ChatGPT's responses in Urdu to diabetes-related questions and (b) the accuracy of ChatGPT's Urdu responses when compared to its English responses.
Methods: A cross-sectional observational study was conducted. Two reviewers experienced in internal medicine and endocrinology graded 53 Urdu and English responses on diabetes knowledge, lifestyle, and prevention. A senior reviewer resolved discrepancies. Responses were assessed for comprehension and accuracy, then compared to English.
Results: Among the Urdu responses generated, only two of 53 (3.8%) questions were graded as comprehensive, and five of 53 (9.4%) were graded as correct but inadequate. We found that 25 of 53 (47.2%) questions were graded as mixed with correct and incorrect/outdated data, the most significant proportion of responses being graded as such. When considering the comparison of response scale grading the comparative accuracy of Urdu and English responses, no Urdu response (0.0%) was considered to have more accuracy than English. Most of the Urdu responses were found to have an accuracy less than that of English, an overwhelming majority of 49 of 53 (92.5%) responses.
Conclusion: We found that although the ability to retrieve such information about diabetes is impressive, it can merely be used as an adjunct instead of a solitary source of information. Further work must be done to optimize Urdu responses in medical contexts to approximate the boundless potential it heralds.
Keywords: Artificial intelligence; chronic disease management; health communication; patient education as a topic; telemedicine.
© The Author(s) 2024.