Protein domain decomposition using a graph-theoretic approach

Y Xu; D Xu; H N Gabow

doi:10.1093/bioinformatics/16.12.1091

Protein domain decomposition using a graph-theoretic approach

Bioinformatics. 2000 Dec;16(12):1091-104. doi: 10.1093/bioinformatics/16.12.1091.

Authors

Y Xu¹, D Xu, H N Gabow

Affiliation

¹ Computational Biosciences Section, Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830-6480, USA. xyn@ornl.gov

PMID: 11159328
DOI: 10.1093/bioinformatics/16.12.1091

Abstract

Motivation: Automatic decomposition of a multi-domain protein into individual domains represents a highly interesting and unsolved problem. As the number of protein structures in PDB is growing at an exponential rate, there is clearly a need for more reliable and efficient methods for protein domain decomposition simply to keep the domain databases up-to-date.

Results: We present a new algorithm for solving the domain decomposition problem, using a graph-theoretic approach. We have formulated the problem as a network flow problem, in which each residue of a protein is represented as a node of the network and each residue--residue contact is represented as an edge with a particular capacity, depending on the type of the contact. A two-domain decomposition problem is solved by finding a bottleneck (or a minimum cut) of the network, which minimizes the total cross-edge capacity, using the classical Ford--Fulkerson algorithm. A multi-domain decomposition problem is solved through repeatedly solving a series of two-domain problems. The algorithm has been implemented as a computer program, called DomainParser. We have tested the program on a commonly used test set consisting of 55 proteins. The decomposition results are 78.2% in agreement with the literature on both the number of decomposed domains and the assignments of residues to each domain, which compares favorably to existing programs. On the subset of two-domain proteins (20 in number), the program assigned 96.7% of the residues correctly when we require that the number of decomposed domains is two.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Computational Biology
Computer Graphics*
Databases, Factual
Models, Molecular
Protein Structure, Tertiary*
Software