Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text

Hasin Rehana; Nur Bengisu Çam; Mert Basmaci; Jie Zheng; Christianah Jemiyo; Yongqun He; Arzucan Özgür; Junguk Hur

doi:10.1093/bioadv/vbae133

Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text

Bioinform Adv. 2024 Sep 11;4(1):vbae133. doi: 10.1093/bioadv/vbae133. eCollection 2024.

Authors

Hasin Rehana^{1

2}, Nur Bengisu Çam³, Mert Basmaci³, Jie Zheng⁴, Christianah Jemiyo², Yongqun He^{4

5}, Arzucan Özgür³, Junguk Hur²

Affiliations

¹ Department of Computer Science, School of Electrical Engineering & Computer Science, University of North Dakota, Grand Forks, ND 58202, United States.
² Department of Biomedical Sciences, School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND 58202, United States.
³ Department of Computer Engineering, Bogazici University, Istanbul 34342, Turkey.
⁴ Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI 48109, United States.
⁵ Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States.

Abstract

Motivation: Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. As biomedical literature continues to grow rapidly, there is an increasing need for automated and accurate extraction of these interactions to facilitate scientific discovery. Pretrained language models, such as generative pretrained transformers and bidirectional encoder representations from transformers, have shown promising results in natural language processing tasks.

Results: We evaluated the performance of PPI identification using multiple transformer-based models across three manually curated gold-standard corpora: Learning Language in Logic with 164 interactions in 77 sentences, Human Protein Reference Database with 163 interactions in 145 sentences, and Interaction Extraction Performance Assessment with 335 interactions in 486 sentences. Models based on bidirectional encoder representations achieved the best overall performance, with BioBERT achieving the highest recall of 91.95% and F1 score of 86.84% on the Learning Language in Logic dataset. Despite not being explicitly trained for biomedical texts, GPT-4 showed commendable performance, comparable to the bidirectional encoder models. Specifically, GPT-4 achieved the highest precision of 88.37%, a recall of 85.14%, and an F1 score of 86.49% on the same dataset. These results suggest that GPT-4 can effectively detect protein interactions from text, offering valuable applications in mining biomedical literature.

Availability and implementation: The source code and datasets used in this study are available at https://github.com/hurlab/PPI-GPT-BERT.