Defuse: An algorithm for gene fusion discovery in tumor rna-seq data

Andrew McPherson, Fereydoun Hormozdiari, Abdalnasser Zayed, Ryan Giuliany, Gavin Ha, Mark G.F. Sun, Malachi Griffith, Alireza Moussavi, Janine Senz, Nataliya Melnyk, Marina Pacheco, Marco A. Marra, Martin Hirst, Torsten O. Nielsen, S. Cenk Sahinalp, David Huntsman, Sohrab P. Shah

Research output: Contribution to journalArticle

316 Citations (Scopus)

Abstract

Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.

Original languageEnglish (US)
Article numbere1001138
JournalPLoS Computational Biology
Volume7
Issue number5
DOIs
StatePublished - May 1 2011
Externally publishedYes

Fingerprint

gene fusion
Gene Fusion
Genetic Association Studies
tumor
Tumors
Tumor
ovarian neoplasms
Fusion
Fusion reactions
Genes
RNA
Gene
cancer
neoplasms
gene
Ovarian Neoplasms
Ovarian Cancer
Transcriptome
Neoplasms
transcriptome

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics

Cite this

McPherson, A., Hormozdiari, F., Zayed, A., Giuliany, R., Ha, G., Sun, M. G. F., ... Shah, S. P. (2011). Defuse: An algorithm for gene fusion discovery in tumor rna-seq data. PLoS Computational Biology, 7(5), [e1001138]. https://doi.org/10.1371/journal.pcbi.1001138

Defuse : An algorithm for gene fusion discovery in tumor rna-seq data. / McPherson, Andrew; Hormozdiari, Fereydoun; Zayed, Abdalnasser; Giuliany, Ryan; Ha, Gavin; Sun, Mark G.F.; Griffith, Malachi; Moussavi, Alireza; Senz, Janine; Melnyk, Nataliya; Pacheco, Marina; Marra, Marco A.; Hirst, Martin; Nielsen, Torsten O.; Sahinalp, S. Cenk; Huntsman, David; Shah, Sohrab P.

In: PLoS Computational Biology, Vol. 7, No. 5, e1001138, 01.05.2011.

Research output: Contribution to journalArticle

McPherson, A, Hormozdiari, F, Zayed, A, Giuliany, R, Ha, G, Sun, MGF, Griffith, M, Moussavi, A, Senz, J, Melnyk, N, Pacheco, M, Marra, MA, Hirst, M, Nielsen, TO, Sahinalp, SC, Huntsman, D & Shah, SP 2011, 'Defuse: An algorithm for gene fusion discovery in tumor rna-seq data', PLoS Computational Biology, vol. 7, no. 5, e1001138. https://doi.org/10.1371/journal.pcbi.1001138
McPherson, Andrew ; Hormozdiari, Fereydoun ; Zayed, Abdalnasser ; Giuliany, Ryan ; Ha, Gavin ; Sun, Mark G.F. ; Griffith, Malachi ; Moussavi, Alireza ; Senz, Janine ; Melnyk, Nataliya ; Pacheco, Marina ; Marra, Marco A. ; Hirst, Martin ; Nielsen, Torsten O. ; Sahinalp, S. Cenk ; Huntsman, David ; Shah, Sohrab P. / Defuse : An algorithm for gene fusion discovery in tumor rna-seq data. In: PLoS Computational Biology. 2011 ; Vol. 7, No. 5.
@article{1ade090a84e942049d1fca728eb94218,
title = "Defuse: An algorithm for gene fusion discovery in tumor rna-seq data",
abstract = "Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.",
author = "Andrew McPherson and Fereydoun Hormozdiari and Abdalnasser Zayed and Ryan Giuliany and Gavin Ha and Sun, {Mark G.F.} and Malachi Griffith and Alireza Moussavi and Janine Senz and Nataliya Melnyk and Marina Pacheco and Marra, {Marco A.} and Martin Hirst and Nielsen, {Torsten O.} and Sahinalp, {S. Cenk} and David Huntsman and Shah, {Sohrab P.}",
year = "2011",
month = "5",
day = "1",
doi = "10.1371/journal.pcbi.1001138",
language = "English (US)",
volume = "7",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "5",

}

TY - JOUR

T1 - Defuse

T2 - An algorithm for gene fusion discovery in tumor rna-seq data

AU - McPherson, Andrew

AU - Hormozdiari, Fereydoun

AU - Zayed, Abdalnasser

AU - Giuliany, Ryan

AU - Ha, Gavin

AU - Sun, Mark G.F.

AU - Griffith, Malachi

AU - Moussavi, Alireza

AU - Senz, Janine

AU - Melnyk, Nataliya

AU - Pacheco, Marina

AU - Marra, Marco A.

AU - Hirst, Martin

AU - Nielsen, Torsten O.

AU - Sahinalp, S. Cenk

AU - Huntsman, David

AU - Shah, Sohrab P.

PY - 2011/5/1

Y1 - 2011/5/1

N2 - Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.

AB - Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.

UR - http://www.scopus.com/inward/record.url?scp=79957829805&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957829805&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1001138

DO - 10.1371/journal.pcbi.1001138

M3 - Article

C2 - 21625565

AN - SCOPUS:79957829805

VL - 7

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 5

M1 - e1001138

ER -