Improved measures for evolutionary conservation that exploit taxonomy distances

Nat Commun. 2019 Apr 5;10(1):1556. doi: 10.1038/s41467-019-09583-2.

Abstract

Selective pressures on protein-coding regions that provide fitness advantages can lead to the regions' fixation and conservation in genome duplications and speciation events. Consequently, conservation analyses relying on sequence similarities are exploited by a myriad of applications across all biosciences to identify functionally important protein regions. While very potent, existing conservation measures based on multiple sequence alignments are so pervasive that improvements to solutions of many problems have become incremental. We introduce a new framework for evolutionary conservation with measures that exploit taxonomy distances across species. Results show that our taxonomy-based framework comfortably outperforms existing conservation measures in identifying deleterious variants observed in the human population, including variants located in non-abundant sequence domains such as intrinsically disordered regions. The predictive power of our approach emphasizes that the phenotypic effects of sequence variants can be taxonomy-level specific and thus, conservation needs to be interpreted accordingly.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Classification / methods
  • Evolution, Molecular*
  • Genetic Variation*
  • Humans
  • Proteins / chemistry
  • Proteins / genetics*
  • Sequence Alignment
  • Sequence Analysis, Protein

Substances

  • Proteins