Optimization of miRNA-seq data preprocessing

Shirley Tam, Ming Sound Tsao, John Douglas Mcpherson

Research output: Contribution to journalArticle

42 Scopus citations

Abstract

The past two decades ofmicroRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regula- tors ofmany biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platformof choice for the discovery and quantification ofmiRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstreamanalyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn fromdownstreamanalyses. Using a spike-in dilution study, we evaluated the effects of several gen- eral-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. Wemake practical recommendations on the optimal preprocessingmethods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.

Original languageEnglish (US)
Article numberbbv019
Pages (from-to)950-963
Number of pages14
JournalBriefings in Bioinformatics
Volume16
Issue number6
DOIs
StatePublished - Feb 6 2015
Externally publishedYes

    Fingerprint

Keywords

  • Data preprocessing
  • miRNA sequencing
  • miRNA-seq normalization
  • Small RNA sequence alignment

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology

Cite this