CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads

Sourav Chatterji, Ichitaro Yamazaki, Zhaojun Bai, Jonathan A Eisen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

82 Scopus citations

Abstract

A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. CompostBin uses a novel weighted PCA algorithm to project the high dimensional DNA composition data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We demonstrate the algorithm's accuracy on a variety of low to medium complexity data sets.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages17-28
Number of pages12
Volume4955 LNBI
DOIs
StatePublished - 2008
Event"12th Annual InternationalConference on REsearch in COmputational Molecular Biology, RECOMB 2008" - Singapore, Singapore
Duration: Mar 30 2008Apr 2 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4955 LNBI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other"12th Annual InternationalConference on REsearch in COmputational Molecular Biology, RECOMB 2008"
CountrySingapore
CitySingapore
Period3/30/084/2/08

Keywords

  • Binning
  • DNA composition metrics
  • Feature extraction
  • Genome signatures
  • Metagenomics
  • Normalized cut
  • Weighted PCA

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads'. Together they form a unique fingerprint.

  • Cite this

    Chatterji, S., Yamazaki, I., Bai, Z., & Eisen, J. A. (2008). CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4955 LNBI, pp. 17-28). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4955 LNBI). https://doi.org/10.1007/978-3-540-78839-3_3