Reconstructing complex regions of genomes using long-read sequencing technology

John Huddleston, Swati Ranade, Maika Malig, Francesca Antonacci, Mark Chaisson, Lawrence Hon, Peter H. Sudmant, Tina A. Graves, Can Alkan, Megan Dennis, Richard K. Wilson, Stephen W. Turner, Jonas Korlach, Evan E. Eichler

Research output: Contribution to journalArticle

138 Citations (Scopus)

Abstract

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.

Original languageEnglish (US)
Pages (from-to)688-696
Number of pages9
JournalGenome Research
Volume24
Issue number4
DOIs
StatePublished - Jan 1 2014
Externally publishedYes

Fingerprint

Genome
Technology
Clone Cells
Genomic Segmental Duplications
Costs and Cost Analysis
Pan troglodytes
Firearms
Human Genome
Chromosomes
Organizations

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)
  • Medicine(all)

Cite this

Huddleston, J., Ranade, S., Malig, M., Antonacci, F., Chaisson, M., Hon, L., ... Eichler, E. E. (2014). Reconstructing complex regions of genomes using long-read sequencing technology. Genome Research, 24(4), 688-696. https://doi.org/10.1101/gr.168450.113

Reconstructing complex regions of genomes using long-read sequencing technology. / Huddleston, John; Ranade, Swati; Malig, Maika; Antonacci, Francesca; Chaisson, Mark; Hon, Lawrence; Sudmant, Peter H.; Graves, Tina A.; Alkan, Can; Dennis, Megan; Wilson, Richard K.; Turner, Stephen W.; Korlach, Jonas; Eichler, Evan E.

In: Genome Research, Vol. 24, No. 4, 01.01.2014, p. 688-696.

Research output: Contribution to journalArticle

Huddleston, J, Ranade, S, Malig, M, Antonacci, F, Chaisson, M, Hon, L, Sudmant, PH, Graves, TA, Alkan, C, Dennis, M, Wilson, RK, Turner, SW, Korlach, J & Eichler, EE 2014, 'Reconstructing complex regions of genomes using long-read sequencing technology', Genome Research, vol. 24, no. 4, pp. 688-696. https://doi.org/10.1101/gr.168450.113
Huddleston J, Ranade S, Malig M, Antonacci F, Chaisson M, Hon L et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Research. 2014 Jan 1;24(4):688-696. https://doi.org/10.1101/gr.168450.113
Huddleston, John ; Ranade, Swati ; Malig, Maika ; Antonacci, Francesca ; Chaisson, Mark ; Hon, Lawrence ; Sudmant, Peter H. ; Graves, Tina A. ; Alkan, Can ; Dennis, Megan ; Wilson, Richard K. ; Turner, Stephen W. ; Korlach, Jonas ; Eichler, Evan E. / Reconstructing complex regions of genomes using long-read sequencing technology. In: Genome Research. 2014 ; Vol. 24, No. 4. pp. 688-696.
@article{d9c4098c79aa48bc88693cfa46ae3dc2,
title = "Reconstructing complex regions of genomes using long-read sequencing technology",
abstract = "Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994{\%} identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.",
author = "John Huddleston and Swati Ranade and Maika Malig and Francesca Antonacci and Mark Chaisson and Lawrence Hon and Sudmant, {Peter H.} and Graves, {Tina A.} and Can Alkan and Megan Dennis and Wilson, {Richard K.} and Turner, {Stephen W.} and Jonas Korlach and Eichler, {Evan E.}",
year = "2014",
month = "1",
day = "1",
doi = "10.1101/gr.168450.113",
language = "English (US)",
volume = "24",
pages = "688--696",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "4",

}

TY - JOUR

T1 - Reconstructing complex regions of genomes using long-read sequencing technology

AU - Huddleston, John

AU - Ranade, Swati

AU - Malig, Maika

AU - Antonacci, Francesca

AU - Chaisson, Mark

AU - Hon, Lawrence

AU - Sudmant, Peter H.

AU - Graves, Tina A.

AU - Alkan, Can

AU - Dennis, Megan

AU - Wilson, Richard K.

AU - Turner, Stephen W.

AU - Korlach, Jonas

AU - Eichler, Evan E.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.

AB - Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.

UR - http://www.scopus.com/inward/record.url?scp=84897965254&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897965254&partnerID=8YFLogxK

U2 - 10.1101/gr.168450.113

DO - 10.1101/gr.168450.113

M3 - Article

C2 - 24418700

AN - SCOPUS:84897965254

VL - 24

SP - 688

EP - 696

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 4

ER -