SAMSA2: A standalone metatranscriptome analysis pipeline

Samuel T. Westreich, Michelle L. Treiber, David A. Mills, Ian F Korf, Danielle G. Lemay

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Background: Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. Results: SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. Conclusions: SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

Original languageEnglish (US)
Article number175
JournalBMC Bioinformatics
Volume19
Issue number1
DOIs
StatePublished - May 21 2018

Fingerprint

Microbial Genes
Pipelines
Databases
Documentation
Experiment
Sequence Analysis
Output
Experiments
Research Personnel
RNA
Gene Expression
Gene expression
Sequencing
High Throughput
Biology
Annotation
Quantify
Throughput
Datasets
Community

Keywords

  • Annotation
  • Bacteria
  • Bioinformatics
  • Cluster
  • Functions
  • GALAXY
  • Metagenomics
  • Metatranscriptome
  • Metatranscriptomics
  • Microbiome
  • Open access
  • Pipeline
  • RNA-seq
  • SAMSA
  • Software
  • Tool

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Westreich, S. T., Treiber, M. L., Mills, D. A., Korf, I. F., & Lemay, D. G. (2018). SAMSA2: A standalone metatranscriptome analysis pipeline. BMC Bioinformatics, 19(1), [175]. https://doi.org/10.1186/s12859-018-2189-z

SAMSA2 : A standalone metatranscriptome analysis pipeline. / Westreich, Samuel T.; Treiber, Michelle L.; Mills, David A.; Korf, Ian F; Lemay, Danielle G.

In: BMC Bioinformatics, Vol. 19, No. 1, 175, 21.05.2018.

Research output: Contribution to journalArticle

Westreich, Samuel T. ; Treiber, Michelle L. ; Mills, David A. ; Korf, Ian F ; Lemay, Danielle G. / SAMSA2 : A standalone metatranscriptome analysis pipeline. In: BMC Bioinformatics. 2018 ; Vol. 19, No. 1.
@article{70b4283580344df2909a5ec937033266,
title = "SAMSA2: A standalone metatranscriptome analysis pipeline",
abstract = "Background: Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. Results: SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. Conclusions: SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.",
keywords = "Annotation, Bacteria, Bioinformatics, Cluster, Functions, GALAXY, Metagenomics, Metatranscriptome, Metatranscriptomics, Microbiome, Open access, Pipeline, RNA-seq, SAMSA, Software, Tool",
author = "Westreich, {Samuel T.} and Treiber, {Michelle L.} and Mills, {David A.} and Korf, {Ian F} and Lemay, {Danielle G.}",
year = "2018",
month = "5",
day = "21",
doi = "10.1186/s12859-018-2189-z",
language = "English (US)",
volume = "19",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - SAMSA2

T2 - A standalone metatranscriptome analysis pipeline

AU - Westreich, Samuel T.

AU - Treiber, Michelle L.

AU - Mills, David A.

AU - Korf, Ian F

AU - Lemay, Danielle G.

PY - 2018/5/21

Y1 - 2018/5/21

N2 - Background: Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. Results: SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. Conclusions: SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

AB - Background: Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. Results: SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. Conclusions: SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

KW - Annotation

KW - Bacteria

KW - Bioinformatics

KW - Cluster

KW - Functions

KW - GALAXY

KW - Metagenomics

KW - Metatranscriptome

KW - Metatranscriptomics

KW - Microbiome

KW - Open access

KW - Pipeline

KW - RNA-seq

KW - SAMSA

KW - Software

KW - Tool

UR - http://www.scopus.com/inward/record.url?scp=85047342214&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047342214&partnerID=8YFLogxK

U2 - 10.1186/s12859-018-2189-z

DO - 10.1186/s12859-018-2189-z

M3 - Article

C2 - 29783945

AN - SCOPUS:85047342214

VL - 19

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 175

ER -