Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity

C. Titus Brown, Dominik Moritz, Michael P. O'Brien, Felix Reidl, Taylor Reiter, Blair D. Sullivan

Research output: Contribution to journalArticle

Abstract

Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/spacegraphcats under the 3-Clause BSD License.

Original languageEnglish (US)
Article number164
JournalGenome Biology
Volume21
Issue number1
DOIs
StatePublished - Jul 6 2020

Keywords

  • Bounded expansion
  • Dominating set
  • Metagenomics
  • Sequence assembly
  • Strain variation

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Fingerprint Dive into the research topics of 'Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity'. Together they form a unique fingerprint.

  • Cite this