Large-scale sequence comparisons with sourmash [version 1; peer review: 2 approved]

N. Tessa Pierce, Luiz Irber, Taylor Reiter, Phillip Brooks, Charles Brown

Research output: Contribution to journalArticle

Abstract

The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

Original languageEnglish (US)
Article number1006
JournalF1000Research
Volume8
DOIs
StatePublished - Jan 1 2019

Fingerprint

Peer Review
Genes
Metagenome
Genome
Boidae
Licensure
Software packages
Software
Databases
RNA
Data storage equipment
DNA
Proteins
Datasets

Keywords

  • Bioinformatics
  • K-mer
  • MinHash
  • Sequence analysis
  • Sourmash

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Pharmacology, Toxicology and Pharmaceutics(all)

Cite this

Large-scale sequence comparisons with sourmash [version 1; peer review : 2 approved]. / Pierce, N. Tessa; Irber, Luiz; Reiter, Taylor; Brooks, Phillip; Brown, Charles.

In: F1000Research, Vol. 8, 1006, 01.01.2019.

Research output: Contribution to journalArticle

Pierce, N. Tessa ; Irber, Luiz ; Reiter, Taylor ; Brooks, Phillip ; Brown, Charles. / Large-scale sequence comparisons with sourmash [version 1; peer review : 2 approved]. In: F1000Research. 2019 ; Vol. 8.
@article{6d782b2467fb4a3fbb734e077b9fc91d,
title = "Large-scale sequence comparisons with sourmash [version 1; peer review: 2 approved]",
abstract = "The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.",
keywords = "Bioinformatics, K-mer, MinHash, Sequence analysis, Sourmash",
author = "Pierce, {N. Tessa} and Luiz Irber and Taylor Reiter and Phillip Brooks and Charles Brown",
year = "2019",
month = "1",
day = "1",
doi = "10.12688/f1000research.19675.1",
language = "English (US)",
volume = "8",
journal = "F1000Research",
issn = "2046-1402",
publisher = "F1000 Research Ltd.",

}

TY - JOUR

T1 - Large-scale sequence comparisons with sourmash [version 1; peer review

T2 - 2 approved]

AU - Pierce, N. Tessa

AU - Irber, Luiz

AU - Reiter, Taylor

AU - Brooks, Phillip

AU - Brown, Charles

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

AB - The sourmash software package uses MinHash-based sketching to create “signatures”, compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

KW - Bioinformatics

KW - K-mer

KW - MinHash

KW - Sequence analysis

KW - Sourmash

UR - http://www.scopus.com/inward/record.url?scp=85072034027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072034027&partnerID=8YFLogxK

U2 - 10.12688/f1000research.19675.1

DO - 10.12688/f1000research.19675.1

M3 - Article

C2 - 31508216

AN - SCOPUS:85072034027

VL - 8

JO - F1000Research

JF - F1000Research

SN - 2046-1402

M1 - 1006

ER -