ConDo: protein domain boundary prediction using coevolutionary information

Bioinformatics. 2019 Jul 15;35(14):2411-2417. doi: 10.1093/bioinformatics/bty973.

Abstract

Motivation: Domain boundary prediction is one of the most important problems in the study of protein structure and function. Many sequence-based domain boundary prediction methods are either template-based or machine learning (ML) based. ML-based methods often perform poorly due to their use of only local (i.e. short-range) features. These conventional features such as sequence profiles, secondary structures and solvent accessibilities are typically restricted to be within 20 residues of the domain boundary candidate.

Results: To address the performance of ML-based methods, we developed a new protein domain boundary prediction method (ConDo) that utilizes novel long-range features such as coevolutionary information in addition to the aforementioned local window features as inputs for ML. Toward this purpose, two types of coevolutionary information were extracted from multiple sequence alignment using direct coupling analysis: (i) partially aligned sequences, and (ii) correlated mutation information. Both the partially aligned sequence information and the modularity of residue-residue couplings possess long-range correlation information.

Availability and implementation: https://github.com/gicsaw/ConDo.git.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning*
  • Protein Domains
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Sequence Alignment

Substances

  • Proteins