Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes

Fereydoun Hormozdiari, Can Alkan, Evan E. Eichler, S. Cenk Sahinalp

Research output: Contribution to journalArticle

223 Citations (Scopus)

Abstract

Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the "next-gen" sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.

Original languageEnglish (US)
Pages (from-to)1270-1278
Number of pages9
JournalGenome Research
Volume19
Issue number7
DOIs
StatePublished - Jul 1 2009
Externally publishedYes

Fingerprint

Genome
Technology
Genomic Structural Variation
Biological Science Disciplines
Firearms
Human Genome
Single Nucleotide Polymorphism
Neoplasms

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. / Hormozdiari, Fereydoun; Alkan, Can; Eichler, Evan E.; Sahinalp, S. Cenk.

In: Genome Research, Vol. 19, No. 7, 01.07.2009, p. 1270-1278.

Research output: Contribution to journalArticle

Hormozdiari, Fereydoun ; Alkan, Can ; Eichler, Evan E. ; Sahinalp, S. Cenk. / Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. In: Genome Research. 2009 ; Vol. 19, No. 7. pp. 1270-1278.
@article{6f8ffbd954b94e49ae51507940030663,
title = "Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes",
abstract = "Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the {"}next-gen{"} sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.",
author = "Fereydoun Hormozdiari and Can Alkan and Eichler, {Evan E.} and Sahinalp, {S. Cenk}",
year = "2009",
month = "7",
day = "1",
doi = "10.1101/gr.088633.108",
language = "English (US)",
volume = "19",
pages = "1270--1278",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "7",

}

TY - JOUR

T1 - Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes

AU - Hormozdiari, Fereydoun

AU - Alkan, Can

AU - Eichler, Evan E.

AU - Sahinalp, S. Cenk

PY - 2009/7/1

Y1 - 2009/7/1

N2 - Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the "next-gen" sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.

AB - Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the "next-gen" sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.

UR - http://www.scopus.com/inward/record.url?scp=67650064593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650064593&partnerID=8YFLogxK

U2 - 10.1101/gr.088633.108

DO - 10.1101/gr.088633.108

M3 - Article

C2 - 19447966

AN - SCOPUS:67650064593

VL - 19

SP - 1270

EP - 1278

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 7

ER -