Accounting for alignment uncertainty in phylogenomics

Martin Wu, Sourav Chatterji, Jonathan A Eisen

Research output: Contribution to journalArticle

109 Citations (Scopus)

Abstract

Uncertainty in multiple sequence alignments has a large impact on phylogenetic analyses. Little has been done to evaluate the quality of individual positions in protein sequence alignments, which directly impact the accuracy of phylogenetic trees. Here we describe ZORRO, a probabilistic masking program that accounts for alignment uncertainty by assigning confidence scores to each alignment position. Using the BALIBASE database and in simulation studies, we demonstrate that masking by ZORRO significantly reduces the alignment uncertainty and improves the tree accuracy.

Original languageEnglish (US)
Article numbere30288
JournalPLoS One
Volume7
Issue number1
DOIs
StatePublished - Jan 17 2012

Fingerprint

Uncertainty
uncertainty
Sequence Alignment
sequence alignment
phylogeny
amino acid sequences
Databases
Proteins

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Accounting for alignment uncertainty in phylogenomics. / Wu, Martin; Chatterji, Sourav; Eisen, Jonathan A.

In: PLoS One, Vol. 7, No. 1, e30288, 17.01.2012.

Research output: Contribution to journalArticle

Wu, Martin ; Chatterji, Sourav ; Eisen, Jonathan A. / Accounting for alignment uncertainty in phylogenomics. In: PLoS One. 2012 ; Vol. 7, No. 1.
@article{fc2084ba75d843e7a6aea7e04f86180e,
title = "Accounting for alignment uncertainty in phylogenomics",
abstract = "Uncertainty in multiple sequence alignments has a large impact on phylogenetic analyses. Little has been done to evaluate the quality of individual positions in protein sequence alignments, which directly impact the accuracy of phylogenetic trees. Here we describe ZORRO, a probabilistic masking program that accounts for alignment uncertainty by assigning confidence scores to each alignment position. Using the BALIBASE database and in simulation studies, we demonstrate that masking by ZORRO significantly reduces the alignment uncertainty and improves the tree accuracy.",
author = "Martin Wu and Sourav Chatterji and Eisen, {Jonathan A}",
year = "2012",
month = "1",
day = "17",
doi = "10.1371/journal.pone.0030288",
language = "English (US)",
volume = "7",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "1",

}

TY - JOUR

T1 - Accounting for alignment uncertainty in phylogenomics

AU - Wu, Martin

AU - Chatterji, Sourav

AU - Eisen, Jonathan A

PY - 2012/1/17

Y1 - 2012/1/17

N2 - Uncertainty in multiple sequence alignments has a large impact on phylogenetic analyses. Little has been done to evaluate the quality of individual positions in protein sequence alignments, which directly impact the accuracy of phylogenetic trees. Here we describe ZORRO, a probabilistic masking program that accounts for alignment uncertainty by assigning confidence scores to each alignment position. Using the BALIBASE database and in simulation studies, we demonstrate that masking by ZORRO significantly reduces the alignment uncertainty and improves the tree accuracy.

AB - Uncertainty in multiple sequence alignments has a large impact on phylogenetic analyses. Little has been done to evaluate the quality of individual positions in protein sequence alignments, which directly impact the accuracy of phylogenetic trees. Here we describe ZORRO, a probabilistic masking program that accounts for alignment uncertainty by assigning confidence scores to each alignment position. Using the BALIBASE database and in simulation studies, we demonstrate that masking by ZORRO significantly reduces the alignment uncertainty and improves the tree accuracy.

UR - http://www.scopus.com/inward/record.url?scp=84855824416&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84855824416&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0030288

DO - 10.1371/journal.pone.0030288

M3 - Article

C2 - 22272325

AN - SCOPUS:84855824416

VL - 7

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 1

M1 - e30288

ER -