More than just pattern recognition: Prediction of uncommon protein structure features by AI methods

Osnat Herzberg; John Moult

doi:10.1073/pnas.2221745120

More than just pattern recognition: Prediction of uncommon protein structure features by AI methods

Proc Natl Acad Sci U S A. 2023 Jul 11;120(28):e2221745120. doi: 10.1073/pnas.2221745120. Epub 2023 Jul 3.

Authors

Osnat Herzberg^{1

2}, John Moult^{1

3}

Affiliations

¹ Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850.
² Chemistry and Biochemistry Department, University of Maryland, Chemistry Building, College Park, MD 20742.
³ Department of Cell Biology and Molecular Genetics, University of Maryland, Microbiology Building, College Park, MD 20742.

Abstract

The CASP14 experiment demonstrated the extraordinary structure modeling capabilities of artificial intelligence (AI) methods. That result has ignited a fierce debate about what these methods are actually doing. One of the criticisms has been that the AI does not have any sense of the underlying physics but is merely performing pattern recognition. Here, we address that issue by analyzing the extent to which the methods identify rare structural motifs. The rationale underlying the approach is that a pattern recognition machine tends to choose the more frequently occurring motifs, whereas some sense of subtle energetic factors is required to choose infrequently occurring ones. To reduce the possibility of bias from related experimental structures and to minimize the effect of experimental errors, we examined only CASP14 target protein crystal structures determined to a resolution limit better than 2 Å, which lacked significant amino acid sequence homology to proteins of known structure. In those experimental structures and in the corresponding models, we track cis peptides, π-helices, 3₁₀-helices, and other small 3D motifs that occur in the PDB database at a frequency of lower than 1% of total amino acid residues. The best-performing AI method, AlphaFold2, captured these uncommon structural elements exquisitely well. All discrepancies appeared to be a consequence of crystal environment effects. We propose that the neural network learned a protein structure potential of mean force, enabling it to correctly identify situations where unusual structural features represent the lowest local free energy because of subtle influences from the atomic environment.

Keywords: AI; CASP14; alphaFold2; structure analysis.

MeSH terms

Amino Acid Sequence
Artificial Intelligence*
Neural Networks, Computer
Protein Conformation
Protein Structure, Secondary
Proteins* / chemistry

Substances

Proteins