Raising the estimate of functional human sequences

Genome Res. 2007 Sep;17(9):1245-53. doi: 10.1101/gr.6406307. Epub 2007 Aug 9.

Abstract

While less than 1.5% of the mammalian genome encodes proteins, it is now evident that the vast majority is transcribed, mainly into non-protein-coding RNAs. This raises the question of what fraction of the genome is functional, i.e., composed of sequences that yield functional products, are required for the expression (regulation or processing) of these products, or are required for chromosome replication and maintenance. Many of the observed noncoding transcripts are differentially expressed, and, while most have not yet been studied, increasing numbers are being shown to be functional and/or trafficked to specific subcellular locations, as well as exhibit subtle evidence of selection. On the other hand, analyses of conservation patterns indicate that only approximately 5% (3%-8%) of the human genome is under purifying selection for functions common to mammals. However, these estimates rely on the assumption that reference sequences (usually ancient transposon-derived sequences) have evolved neutrally, which may not be the case, and if so would lead to an underestimate of the fraction of the genome under evolutionary constraint. These analyses also do not detect functional sequences that are evolving rapidly and/or have acquired lineage-specific functions. Indeed, many regulatory sequences and known functional noncoding RNAs, including many microRNAs, are not conserved over significant evolutionary distances, and recent evidence from the ENCODE project suggests that many functional elements show no detectable level of sequence constraint. Thus, it is likely that much more than 5% of the genome encodes functional information, and although the upper bound is unknown, it may be considerably higher than currently thought.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Base Sequence
  • Conserved Sequence
  • Evolution, Molecular
  • Genome, Human
  • Humans
  • Interspersed Repetitive Sequences*
  • MicroRNAs
  • Molecular Sequence Data
  • RNA, Untranslated
  • Selection, Genetic
  • Transcription, Genetic

Substances

  • MicroRNAs
  • RNA, Untranslated