Mapping of small RNAs in the human ENCODE regions

Am J Hum Genet. 2008 Apr;82(4):971-81. doi: 10.1016/j.ajhg.2008.02.016.

Abstract

The elucidation of the largely unknown transcriptome of small RNAs is crucial for the understanding of genome and cellular function. We report here the results of the analysis of small RNAs (< 50 nt) in the ENCODE regions of the human genome. Size-fractionated RNAs from four different cell lines (HepG2, HelaS3, GM06990, SK-N-SH) were mapped with the forward and reverse ENCODE high-density resolution tiling arrays. The top 1% of hybridization signals are termed SmRfrags (Small RNA fragments). Eight percent of SmRfrags overlap the GENCODE genes (CDS), given that the majority map to intergenic regions (34%), intronic regions (53%), and untranslated regions (UTRs) (5%). In addition, 9.6% and 16.8% of SmRfrags in the 5' UTR regions overlap significantly with His/Pol II/TAF250 binding sites and DNase I Hypersensitive sites, respectively (compared to the 5.3% and 9% expected). Interestingly, 17%-24% (depending on the cell line) of SmRfrags are sense-antisense strand pairs that show evidence of overlapping transcription. Only 3.4% and 7.2% of SmRfrags in intergenic regions overlap transcribed fragments (Txfrags) in HeLa and GM06990 cell lines, respectively. We hypothesized that a fraction of the identified SmRfrags corresponded to microRNAs. We tested by Northern blot a set of 15 high-likelihood predictions of microRNA candidates that overlap with smRfrags and validated three potential microRNAs ( approximately 20 nt length). Notably, most of the remaining candidates showed a larger hybridizing band ( approximately 100 nt) that could be a microRNA precursor. The small RNA transcriptome is emerging as an important and abundant component of the genome function.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 5' Untranslated Regions / genetics
  • Base Sequence
  • Cell Line, Tumor
  • Chromosome Mapping*
  • Genome, Human / genetics*
  • Humans
  • MicroRNAs / genetics*
  • Molecular Sequence Data
  • Oligonucleotide Array Sequence Analysis
  • Transcription, Genetic*

Substances

  • 5' Untranslated Regions
  • MicroRNAs