De novo assembly of human genome at single-cell levels

Nucleic Acids Res. 2022 Jul 22;50(13):7479-7492. doi: 10.1093/nar/gkac586.

Abstract

Genome assembly has been benefited from long-read sequencing technologies with higher accuracy and higher continuity. However, most human genome assembly require large amount of DNAs from homogeneous cell lines without keeping cell heterogeneities, since cell heterogeneity could profoundly affect haplotype assembly results. Herein, using single-cell genome long-read sequencing technology (SMOOTH-seq), we have sequenced K562 and HG002 cells on PacBio HiFi and Oxford Nanopore Technologies (ONT) platforms and conducted de novo genome assembly. For the first time, we have completed the human genome assembly with high continuity (with NG50 of ∼2 Mb using 95 individual K562 cells) at single-cell levels, and explored the impact of different assemblers and sequencing strategies on genome assembly. With sequencing data from 30 diploid individual HG002 cells of relatively high genome coverage (average coverage ∼41.7%) on ONT platform, the NG50 can reach over 1.3 Mb. Furthermore, with the assembled genome from K562 single-cell dataset, more complete and accurate set of insertion events and complex structural variations could be identified. This study opened a new chapter on the practice of single-cell genome de novo assembly.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosome Mapping
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Nanopores*
  • Sequence Analysis, DNA / methods