Protein complexes are the foundation of all cellular activities, and accurately identifying them is crucial for studying cellular systems. The efficient discovery of protein complexes is a focus of research in the field of bioinformatics. Most existing methods for protein complex identification are based on the structure of the protein-protein interaction (PPI) network, whereas some methods attempt to integrate biological information to enhance the features of the protein network for complex identification. Existing protein complex identification methods are unable to fully integrate network topology information and biological attribute information. Most of these methods are based on homogeneous networks and cannot distinguish the importance of different attributes and protein nodes. To address these issues, a GO attribute Heterogeneous Attention network Embedding (GHAE) method based on heterogeneous protein information networks is proposed. First, GHAE incorporates Gene Ontology (GO) information into the PPI network, constructing a heterogeneous protein information network. Then, GHAE uses a dual attention mechanism and heterogeneous graph convolutional representation learning method to learn protein features and to identify protein complexes. The experimental results show that building heterogeneous protein information networks can fully integrate valuable biological information. The heterogeneous graph embedding learning method can simultaneously mine the features of protein and GO attributes, thereby improving the performance of protein complex identification.
Keywords: attention mechanism; protein complexes; protein information network; representation learning.