Accurate and exact CNV identification from targeted high-throughput sequence data

Alexander Nord, Ming Lee, Mary Claire King, Tom Walsh

Research output: Contribution to journalArticle

112 Citations (Scopus)

Abstract

Background: Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data.Results: Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate.Conclusions: Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

Original languageEnglish (US)
Article number184
JournalBMC Genomics
Volume12
DOIs
StatePublished - Apr 12 2011
Externally publishedYes

Fingerprint

High-Throughput Nucleotide Sequencing
Mutation
Nucleotides
Technology
DNA
Genes
Datasets

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Accurate and exact CNV identification from targeted high-throughput sequence data. / Nord, Alexander; Lee, Ming; King, Mary Claire; Walsh, Tom.

In: BMC Genomics, Vol. 12, 184, 12.04.2011.

Research output: Contribution to journalArticle

Nord, Alexander ; Lee, Ming ; King, Mary Claire ; Walsh, Tom. / Accurate and exact CNV identification from targeted high-throughput sequence data. In: BMC Genomics. 2011 ; Vol. 12.
@article{a39de9ba6b4143d0bca99a406845864e,
title = "Accurate and exact CNV identification from targeted high-throughput sequence data",
abstract = "Background: Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data.Results: Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate.Conclusions: Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.",
author = "Alexander Nord and Ming Lee and King, {Mary Claire} and Tom Walsh",
year = "2011",
month = "4",
day = "12",
doi = "10.1186/1471-2164-12-184",
language = "English (US)",
volume = "12",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Accurate and exact CNV identification from targeted high-throughput sequence data

AU - Nord, Alexander

AU - Lee, Ming

AU - King, Mary Claire

AU - Walsh, Tom

PY - 2011/4/12

Y1 - 2011/4/12

N2 - Background: Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data.Results: Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate.Conclusions: Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

AB - Background: Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data.Results: Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate.Conclusions: Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

UR - http://www.scopus.com/inward/record.url?scp=79953855362&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79953855362&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-12-184

DO - 10.1186/1471-2164-12-184

M3 - Article

C2 - 21486468

AN - SCOPUS:79953855362

VL - 12

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 184

ER -