MQF and buffered MQF: quotient filters for efficient storage of k-mers with their counts and metadata

Moustafa Shokrof, C. Titus Brown, Tamer A. Mansour

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Specialized data structures are required for online algorithms to efficiently handle large sequencing datasets. The counting quotient filter (CQF), a compact hashtable, can efficiently store k-mers with a skewed distribution. Result: Here, we present the mixed-counters quotient filter (MQF) as a new variant of the CQF with novel counting and labeling systems. The new counting system adapts to a wider range of data distributions for increased space efficiency and is faster than the CQF for insertions and queries in most of the tested scenarios. A buffered version of the MQF can offload storage to disk, trading speed of insertions and queries for a significant memory reduction. The labeling system provides a flexible framework for assigning labels to member items while maintaining good data locality and a concise memory representation. These labels serve as a minimal perfect hash function but are ~ tenfold faster than BBhash, with no need to re-analyze the original data for further insertions or deletions. Conclusions: The MQF is a flexible and efficient data structure that extends our ability to work with high throughput sequencing data.

Original languageEnglish (US)
Article number71
JournalBMC Bioinformatics
Volume22
Issue number1
DOIs
StatePublished - Dec 2021

Keywords

  • Compact hash tables
  • Debruijn graphs
  • Inexact data structures
  • k-mers
  • NGS

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Fingerprint Dive into the research topics of 'MQF and buffered MQF: quotient filters for efficient storage of k-mers with their counts and metadata'. Together they form a unique fingerprint.

Cite this