Comrad: Detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data

Andrew McPherson, Chunxiao Wu, Iman Hajirasouliha, Fereydoun Hormozdiari, Faraz Hach, Anna Lapuk, Stanislav Volik, Sohrab Shah, Colin Collins, S. Cenk Sahinalp

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

Motivation: Comrad is a novel algorithmic framework for the integrated analysis of RNA-Seq and whole genome shotgun sequencing (WGSS) data for the purposes of discovering genomic rearrangements and aberrant transcripts. The Comrad framework leverages the advantages of both RNA-Seq and WGSS data, providing accurate classification of rearrangements as expressed or not expressed and accurate classification of the genomic or non- genomic origin of aberrant transcripts. A major benefit of Comrad is its ability to accurately identify aberrant transcripts and associated rearrangements using low coverage genome data. As a result, a Comrad analysis can be performed at a cost comparable to that of two RNA-Seq experiments, significantly lower than an analysis requiring high coverage genome data. Results: We have applied Comrad to the discovery of gene fusions and read-throughs in prostate cancer cell line C4-2, a derivative of the LNCaP cell line with androgen-independent characteristics. As a proof of concept, we have rediscovered in the C4-2 data 4 of the 6 fusions previously identified in LNCaP. We also identified six novel fusion transcripts and associated genomic breakpoints, and verified their existence in LNCaP, suggesting that Comrad may be more sensitive than previous methods that have been applied to fusion discovery in LNCaP. We show that many of the gene fusions discovered using Comrad would be difficult to identify using currently available techniques.

Original languageEnglish (US)
Article numberbtr184
Pages (from-to)1481-1488
Number of pages8
JournalBioinformatics
Volume27
Issue number11
DOIs
StatePublished - Jan 1 2011
Externally publishedYes

Fingerprint

RNA
Rearrangement
Fusion
Genome
Coverage
Genes
Fusion reactions
Gene Fusion
Firearms
Genomics
Sequencing
Cell Line
Genomic Rearrangements
Gene
Cells
Prostate Cancer
Line
Androgens
Cell
Prostatic Neoplasms

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Comrad : Detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data. / McPherson, Andrew; Wu, Chunxiao; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Hach, Faraz; Lapuk, Anna; Volik, Stanislav; Shah, Sohrab; Collins, Colin; Sahinalp, S. Cenk.

In: Bioinformatics, Vol. 27, No. 11, btr184, 01.01.2011, p. 1481-1488.

Research output: Contribution to journalArticle

McPherson, A, Wu, C, Hajirasouliha, I, Hormozdiari, F, Hach, F, Lapuk, A, Volik, S, Shah, S, Collins, C & Sahinalp, SC 2011, 'Comrad: Detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data', Bioinformatics, vol. 27, no. 11, btr184, pp. 1481-1488. https://doi.org/10.1093/bioinformatics/btr184
McPherson, Andrew ; Wu, Chunxiao ; Hajirasouliha, Iman ; Hormozdiari, Fereydoun ; Hach, Faraz ; Lapuk, Anna ; Volik, Stanislav ; Shah, Sohrab ; Collins, Colin ; Sahinalp, S. Cenk. / Comrad : Detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data. In: Bioinformatics. 2011 ; Vol. 27, No. 11. pp. 1481-1488.
@article{6b2cc1646e734e9ca84a46b9e60dd83b,
title = "Comrad: Detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data",
abstract = "Motivation: Comrad is a novel algorithmic framework for the integrated analysis of RNA-Seq and whole genome shotgun sequencing (WGSS) data for the purposes of discovering genomic rearrangements and aberrant transcripts. The Comrad framework leverages the advantages of both RNA-Seq and WGSS data, providing accurate classification of rearrangements as expressed or not expressed and accurate classification of the genomic or non- genomic origin of aberrant transcripts. A major benefit of Comrad is its ability to accurately identify aberrant transcripts and associated rearrangements using low coverage genome data. As a result, a Comrad analysis can be performed at a cost comparable to that of two RNA-Seq experiments, significantly lower than an analysis requiring high coverage genome data. Results: We have applied Comrad to the discovery of gene fusions and read-throughs in prostate cancer cell line C4-2, a derivative of the LNCaP cell line with androgen-independent characteristics. As a proof of concept, we have rediscovered in the C4-2 data 4 of the 6 fusions previously identified in LNCaP. We also identified six novel fusion transcripts and associated genomic breakpoints, and verified their existence in LNCaP, suggesting that Comrad may be more sensitive than previous methods that have been applied to fusion discovery in LNCaP. We show that many of the gene fusions discovered using Comrad would be difficult to identify using currently available techniques.",
author = "Andrew McPherson and Chunxiao Wu and Iman Hajirasouliha and Fereydoun Hormozdiari and Faraz Hach and Anna Lapuk and Stanislav Volik and Sohrab Shah and Colin Collins and Sahinalp, {S. Cenk}",
year = "2011",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/btr184",
language = "English (US)",
volume = "27",
pages = "1481--1488",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "11",

}

TY - JOUR

T1 - Comrad

T2 - Detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data

AU - McPherson, Andrew

AU - Wu, Chunxiao

AU - Hajirasouliha, Iman

AU - Hormozdiari, Fereydoun

AU - Hach, Faraz

AU - Lapuk, Anna

AU - Volik, Stanislav

AU - Shah, Sohrab

AU - Collins, Colin

AU - Sahinalp, S. Cenk

PY - 2011/1/1

Y1 - 2011/1/1

N2 - Motivation: Comrad is a novel algorithmic framework for the integrated analysis of RNA-Seq and whole genome shotgun sequencing (WGSS) data for the purposes of discovering genomic rearrangements and aberrant transcripts. The Comrad framework leverages the advantages of both RNA-Seq and WGSS data, providing accurate classification of rearrangements as expressed or not expressed and accurate classification of the genomic or non- genomic origin of aberrant transcripts. A major benefit of Comrad is its ability to accurately identify aberrant transcripts and associated rearrangements using low coverage genome data. As a result, a Comrad analysis can be performed at a cost comparable to that of two RNA-Seq experiments, significantly lower than an analysis requiring high coverage genome data. Results: We have applied Comrad to the discovery of gene fusions and read-throughs in prostate cancer cell line C4-2, a derivative of the LNCaP cell line with androgen-independent characteristics. As a proof of concept, we have rediscovered in the C4-2 data 4 of the 6 fusions previously identified in LNCaP. We also identified six novel fusion transcripts and associated genomic breakpoints, and verified their existence in LNCaP, suggesting that Comrad may be more sensitive than previous methods that have been applied to fusion discovery in LNCaP. We show that many of the gene fusions discovered using Comrad would be difficult to identify using currently available techniques.

AB - Motivation: Comrad is a novel algorithmic framework for the integrated analysis of RNA-Seq and whole genome shotgun sequencing (WGSS) data for the purposes of discovering genomic rearrangements and aberrant transcripts. The Comrad framework leverages the advantages of both RNA-Seq and WGSS data, providing accurate classification of rearrangements as expressed or not expressed and accurate classification of the genomic or non- genomic origin of aberrant transcripts. A major benefit of Comrad is its ability to accurately identify aberrant transcripts and associated rearrangements using low coverage genome data. As a result, a Comrad analysis can be performed at a cost comparable to that of two RNA-Seq experiments, significantly lower than an analysis requiring high coverage genome data. Results: We have applied Comrad to the discovery of gene fusions and read-throughs in prostate cancer cell line C4-2, a derivative of the LNCaP cell line with androgen-independent characteristics. As a proof of concept, we have rediscovered in the C4-2 data 4 of the 6 fusions previously identified in LNCaP. We also identified six novel fusion transcripts and associated genomic breakpoints, and verified their existence in LNCaP, suggesting that Comrad may be more sensitive than previous methods that have been applied to fusion discovery in LNCaP. We show that many of the gene fusions discovered using Comrad would be difficult to identify using currently available techniques.

UR - http://www.scopus.com/inward/record.url?scp=79957875348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957875348&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr184

DO - 10.1093/bioinformatics/btr184

M3 - Article

C2 - 21478487

AN - SCOPUS:79957875348

VL - 27

SP - 1481

EP - 1488

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 11

M1 - btr184

ER -