Large-scale sequence comparisons with sourmash [version 1; peer review: 2 approved]

N. Tessa Pierce, Luiz Irber, Taylor Reiter, Phillip Brooks, Charles Brown

Research output: Contribution to journalArticle

3 Scopus citations

Abstract

The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

Original languageEnglish (US)
Article number1006
JournalF1000Research
Volume8
DOIs
StatePublished - Jan 1 2019

Keywords

  • Bioinformatics
  • K-mer
  • MinHash
  • Sequence analysis
  • Sourmash

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Pharmacology, Toxicology and Pharmaceutics(all)

Fingerprint Dive into the research topics of 'Large-scale sequence comparisons with sourmash [version 1; peer review: 2 approved]'. Together they form a unique fingerprint.

  • Cite this