Protein Complex Similarity

Protein Complex Similarity

Abstract

Proteins have manifold functions in living cells, including structural integrity,transport, defense against pathogens, or message transmission, to name but a few.Recent advances in Machine Learning appear to have solved the protein folding problem,i.e., how to obtain the three-dimensional functional protein structure from the aminoacid sequence of the protein. However, proteins rarely act alone, but instead performtheir functions together with other proteins in so-calledprotein complexes. Quantifyingthe similarity between two protein complexes is essential for numerous applications,e.g., for database searches of complexes that are similar to a given input complex. Whilesimilarity measures have been extensively studied on single proteins and on proteinfamilies, there is little work on modeling and computing the similarity between proteincomplexes yet. Because protein complexes can be naturally modeled as graphs, graphsimilarity measures may be used, but these are often computationally hard to obtainand do not take typical properties of protein complexes into account. We introducea parametric family of similarity measures based on Weisfeiler-Leman labeling see"The Weisfeiler-Leman Algorithm for Machine Learning with Graphs" in Section 4.2 inVolume 1. Based on simulated complexes, we show that the defined family of similaritymeasures is in good agreement with edit similarity, a similarity measure derived fromgraph edit distance, though it can be computed more efficiently. Moreover, in contrast tograph edit similarity, the proposed measures allow for an efficient similarity search inlarge volumes of protein complex data. It can therefore be used as a basis for large-scalemachine learning applications.

Grafik Top
Authors
  • Stöcker, Bianca K.
  • Schäfer, Till
  • Mutzel, Petra
  • Köster, Johannes
  • Kriege, Nils M.
  • Rahmann, Sven
Grafik Top
Shortfacts
Category
Book Section/Chapter
Divisions
Data Mining and Machine Learning
Title of Book
Machine Learning under Resource Constraints - Applications
ISSN/ISBN
978-3-11-078597-5
Page Range
pp. 85-102
Date
2022
Export
Grafik Top