Protein Complex Similarity
Proteins have manifold functions in living cells, including structural integrity,transport, defense against pathogens, or message transmission, to name but a few.Recent advances in Machine Learning appear to have solved the protein folding problem,i.e., how to obtain the three-dimensional functional protein structure from the aminoacid sequence of the protein. However, proteins rarely act alone, but instead performtheir functions together with other proteins in so-calledprotein complexes. Quantifyingthe similarity between two protein complexes is essential for numerous applications,e.g., for database searches of complexes that are similar to a given input complex. Whilesimilarity measures have been extensively studied on single proteins and on proteinfamilies, there is little work on modeling and computing the similarity between proteincomplexes yet. Because protein complexes can be naturally modeled as graphs, graphsimilarity measures may be used, but these are often computationally hard to obtainand do not take typical properties of protein complexes into account. We introducea parametric family of similarity measures based on Weisfeiler-Leman labeling see"The Weisfeiler-Leman Algorithm for Machine Learning with Graphs" in Section 4.2 inVolume 1. Based on simulated complexes, we show that the defined family of similaritymeasures is in good agreement with edit similarity, a similarity measure derived fromgraph edit distance, though it can be computed more efficiently. Moreover, in contrast tograph edit similarity, the proposed measures allow for an efficient similarity search inlarge volumes of protein complex data. It can therefore be used as a basis for large-scalemachine learning applications.
Top- Stöcker, Bianca K.
- Schäfer, Till
- Mutzel, Petra
- Köster, Johannes
- Kriege, Nils M.
- Rahmann, Sven
Category |
Book Section/Chapter |
Divisions |
Data Mining and Machine Learning |
Title of Book |
Machine Learning under Resource Constraints - Applications |
ISSN/ISBN |
978-3-11-078597-5 |
Page Range |
pp. 85-102 |
Date |
2022 |
Export |