Abstract
The past two decades ofmicroRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regula- tors ofmany biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platformof choice for the discovery and quantification ofmiRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstreamanalyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn fromdownstreamanalyses. Using a spike-in dilution study, we evaluated the effects of several gen- eral-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. Wemake practical recommendations on the optimal preprocessingmethods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.
Original language | English (US) |
---|---|
Article number | bbv019 |
Pages (from-to) | 950-963 |
Number of pages | 14 |
Journal | Briefings in Bioinformatics |
Volume | 16 |
Issue number | 6 |
DOIs | |
State | Published - Feb 6 2015 |
Externally published | Yes |
Keywords
- Data preprocessing
- miRNA sequencing
- miRNA-seq normalization
- Small RNA sequence alignment
ASJC Scopus subject areas
- Information Systems
- Molecular Biology