Optimization of miRNA-seq data preprocessing

Shirley Tam, Ming Sound Tsao, John Douglas Mcpherson

Research output: Contribution to journalArticle

40 Citations (Scopus)

Abstract

The past two decades ofmicroRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regula- tors ofmany biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platformof choice for the discovery and quantification ofmiRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstreamanalyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn fromdownstreamanalyses. Using a spike-in dilution study, we evaluated the effects of several gen- eral-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. Wemake practical recommendations on the optimal preprocessingmethods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.

Original languageEnglish (US)
Article numberbbv019
Pages (from-to)950-963
Number of pages14
JournalBriefings in Bioinformatics
Volume16
Issue number6
DOIs
StatePublished - Feb 6 2015
Externally publishedYes

Fingerprint

RNA
MicroRNAs
Biomarkers
Linear regression
Dilution
RNA Sequence Analysis
Biological Phenomena
Small Untranslated RNA
Throughput
Noise
Linear Models
Technology
Experiments
Research

Keywords

  • Data preprocessing
  • miRNA sequencing
  • miRNA-seq normalization
  • Small RNA sequence alignment

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology

Cite this

Optimization of miRNA-seq data preprocessing. / Tam, Shirley; Tsao, Ming Sound; Mcpherson, John Douglas.

In: Briefings in Bioinformatics, Vol. 16, No. 6, bbv019, 06.02.2015, p. 950-963.

Research output: Contribution to journalArticle

Tam, Shirley ; Tsao, Ming Sound ; Mcpherson, John Douglas. / Optimization of miRNA-seq data preprocessing. In: Briefings in Bioinformatics. 2015 ; Vol. 16, No. 6. pp. 950-963.
@article{fdfbf2c577ce45619b0637475ad22fdb,
title = "Optimization of miRNA-seq data preprocessing",
abstract = "The past two decades ofmicroRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regula- tors ofmany biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platformof choice for the discovery and quantification ofmiRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstreamanalyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn fromdownstreamanalyses. Using a spike-in dilution study, we evaluated the effects of several gen- eral-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. Wemake practical recommendations on the optimal preprocessingmethods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.",
keywords = "Data preprocessing, miRNA sequencing, miRNA-seq normalization, Small RNA sequence alignment",
author = "Shirley Tam and Tsao, {Ming Sound} and Mcpherson, {John Douglas}",
year = "2015",
month = "2",
day = "6",
doi = "10.1093/bib/bbv019",
language = "English (US)",
volume = "16",
pages = "950--963",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - Optimization of miRNA-seq data preprocessing

AU - Tam, Shirley

AU - Tsao, Ming Sound

AU - Mcpherson, John Douglas

PY - 2015/2/6

Y1 - 2015/2/6

N2 - The past two decades ofmicroRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regula- tors ofmany biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platformof choice for the discovery and quantification ofmiRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstreamanalyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn fromdownstreamanalyses. Using a spike-in dilution study, we evaluated the effects of several gen- eral-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. Wemake practical recommendations on the optimal preprocessingmethods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.

AB - The past two decades ofmicroRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regula- tors ofmany biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platformof choice for the discovery and quantification ofmiRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstreamanalyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn fromdownstreamanalyses. Using a spike-in dilution study, we evaluated the effects of several gen- eral-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. Wemake practical recommendations on the optimal preprocessingmethods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.

KW - Data preprocessing

KW - miRNA sequencing

KW - miRNA-seq normalization

KW - Small RNA sequence alignment

UR - http://www.scopus.com/inward/record.url?scp=84938380221&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938380221&partnerID=8YFLogxK

U2 - 10.1093/bib/bbv019

DO - 10.1093/bib/bbv019

M3 - Article

C2 - 25888698

AN - SCOPUS:84938380221

VL - 16

SP - 950

EP - 963

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

IS - 6

M1 - bbv019

ER -