Resolving the complexity of the human genome using single-molecule sequencing

Mark J.P. Chaisson, John Huddleston, Megan Dennis, Peter H. Sudmant, Maika Malig, Fereydoun Hormozdiari, Francesca Antonacci, Urvashi Surti, Richard Sandstrom, Matthew Boitano, Jane M. Landolin, John A. Stamatoyannopoulos, Michael W. Hunkapiller, Jonas Korlach, Evan E. Eichler

Research output: Contribution to journalArticlepeer-review

380 Scopus citations

Abstract

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome - 78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.

Original languageEnglish (US)
Pages (from-to)608-611
Number of pages4
JournalNature
Volume517
Issue number7536
DOIs
StatePublished - Jan 29 2015
Externally publishedYes

ASJC Scopus subject areas

  • General

Fingerprint Dive into the research topics of 'Resolving the complexity of the human genome using single-molecule sequencing'. Together they form a unique fingerprint.

Cite this