1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

Supratim Mukherjee, Rekha Seshadri, Neha J. Varghese, Emiley A. Eloe-Fadrosh, Jan P. Meier-Kolthoff, Markus Göker, R. Cameron Coates, Michalis Hadjithomas, Georgios A. Pavlopoulos, David Paez-Espino, Yasuo Yoshikuni, Axel Visel, William B. Whitman, George M. Garrity, Jonathan A. Eisen, Philip Hugenholtz, Amrita Pati, Natalia N. Ivanova, Tanja Woyke, Hans Peter KlenkNikos C. Kyrpides

Research output: Contribution to journalArticlepeer-review

137 Scopus citations


We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster with potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.

Original languageEnglish (US)
Pages (from-to)676-683
Number of pages8
JournalNature Biotechnology
Issue number7
StatePublished - Jul 1 2017

ASJC Scopus subject areas

  • Biotechnology
  • Bioengineering
  • Applied Microbiology and Biotechnology
  • Biomedical Engineering
  • Molecular Medicine


Dive into the research topics of '1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life'. Together they form a unique fingerprint.

Cite this