An automated phylogenetic tree-based Small subunit rRNA Taxonomy and Alignment Pipeline (STAP)

Dongying Wu, Amber Hartman, Naomi Ward, Jonathan A Eisen

Research output: Contribution to journalArticle

45 Citations (Scopus)

Abstract

Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts.

Original languageEnglish (US)
Article numbere2566
JournalPLoS One
Volume3
Issue number7
DOIs
StatePublished - Jul 2 2008

Fingerprint

Sequence Alignment
sequence alignment
Taxonomies
Pipelines
ribosomal RNA
taxonomy
phylogeny
Small Ribosome Subunits
Ribosomal RNA
rRNA Genes
Genes
Biodiversity
microorganisms
nucleotide sequences
Costs and Cost Analysis
species diversity
Microorganisms
methodology
Throughput
genes

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

An automated phylogenetic tree-based Small subunit rRNA Taxonomy and Alignment Pipeline (STAP). / Wu, Dongying; Hartman, Amber; Ward, Naomi; Eisen, Jonathan A.

In: PLoS One, Vol. 3, No. 7, e2566, 02.07.2008.

Research output: Contribution to journalArticle

@article{047c99e0694a4559b74bf515d5d38df1,
title = "An automated phylogenetic tree-based Small subunit rRNA Taxonomy and Alignment Pipeline (STAP)",
abstract = "Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts.",
author = "Dongying Wu and Amber Hartman and Naomi Ward and Eisen, {Jonathan A}",
year = "2008",
month = "7",
day = "2",
doi = "10.1371/journal.pone.0002566",
language = "English (US)",
volume = "3",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "7",

}

TY - JOUR

T1 - An automated phylogenetic tree-based Small subunit rRNA Taxonomy and Alignment Pipeline (STAP)

AU - Wu, Dongying

AU - Hartman, Amber

AU - Ward, Naomi

AU - Eisen, Jonathan A

PY - 2008/7/2

Y1 - 2008/7/2

N2 - Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts.

AB - Comparative analysis of small-subunit ribosomal RNA (ss-rRNA) gene sequences forms the basis for much of what we know about the phylogenetic diversity of both cultured and uncultured microorganisms. As sequencing costs continue to decline and throughput increases, sequences of ss-rRNA genes are being obtained at an ever-increasing rate. This increasing flow of data has opened many new windows into microbial diversity and evolution, and at the same time has created significant methodological challenges. Those processes which commonly require time-consuming human intervention, such as the preparation of multiple sequence alignments, simply cannot keep up with the flood of incoming data. Fully automated methods of analysis are needed. Notably, existing automated methods avoid one or more steps that, though computationally costly or difficult, we consider to be important. In particular, we regard both the building of multiple sequence alignments and the performance of high quality phylogenetic analysis to be necessary. We describe here our fully-automated ss-rRNA taxonomy and alignment pipeline (STAP). It generates both high-quality multiple sequence alignments and phylogenetic trees, and thus can be used for multiple purposes including phylogenetically-based taxonomic assignments and analysis of species diversity in environmental samples. The pipeline combines publicly-available packages (PHYML, BLASTN and CLUSTALW) with our automatic alignment, masking, and tree-parsing programs. Most importantly, this automated process yields results comparable to those achievable by manual analysis, yet offers speed and capacity that are unattainable by manual efforts.

UR - http://www.scopus.com/inward/record.url?scp=50149086330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=50149086330&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0002566

DO - 10.1371/journal.pone.0002566

M3 - Article

C2 - 18596968

AN - SCOPUS:50149086330

VL - 3

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 7

M1 - e2566

ER -