A new method for DNA sequencing error verification and correction via an on-disk index tree

Yarong Gu, Xianying Liu, Qiang Zhu, Youchao Dong, Charles Brown, Sakti Pramanik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Existing sequencing error correction techniques demand large expensive memory space. In this work, we introduce a new disk-based sequencing error correction method to solve the problem. The key idea is to utilize a special on-disk index structure, called the BoND-tree, to store and access a large set of k-mers and their associated metadata on disk. With the BoND-tree, a set of special box queries to retrieve the relevant k-mers and their counts are efficiently processed. A comprehensive voting mechanism is adopted to determine and correct an erroneous base in a genome sequence. Ex-periments demonstrate that the proposed method is quite promising in verifying and correcting sequencing errors in terms of accuracy and scalability. Copyright is held by the author/owner(s).

Original languageEnglish (US)
Title of host publicationBCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages503-504
Number of pages2
ISBN (Electronic)9781450338530
DOIs
StatePublished - Sep 9 2015
Externally publishedYes
Event6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015 - Atlanta, United States
Duration: Sep 9 2015Sep 12 2015

Other

Other6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015
CountryUnited States
CityAtlanta
Period9/9/159/12/15

Fingerprint

Error correction
DNA Sequence Analysis
DNA
Politics
Metadata
Scalability
Genes
Genome
Data storage equipment

Keywords

  • Bioinformatics
  • Disk index tree
  • Sequencing error correction

ASJC Scopus subject areas

  • Software
  • Health Informatics
  • Computer Science Applications
  • Biomedical Engineering

Cite this

Gu, Y., Liu, X., Zhu, Q., Dong, Y., Brown, C., & Pramanik, S. (2015). A new method for DNA sequencing error verification and correction via an on-disk index tree. In BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 503-504). Association for Computing Machinery, Inc. https://doi.org/10.1145/2808719.2811429

A new method for DNA sequencing error verification and correction via an on-disk index tree. / Gu, Yarong; Liu, Xianying; Zhu, Qiang; Dong, Youchao; Brown, Charles; Pramanik, Sakti.

BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, 2015. p. 503-504.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gu, Y, Liu, X, Zhu, Q, Dong, Y, Brown, C & Pramanik, S 2015, A new method for DNA sequencing error verification and correction via an on-disk index tree. in BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, pp. 503-504, 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015, Atlanta, United States, 9/9/15. https://doi.org/10.1145/2808719.2811429
Gu Y, Liu X, Zhu Q, Dong Y, Brown C, Pramanik S. A new method for DNA sequencing error verification and correction via an on-disk index tree. In BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc. 2015. p. 503-504 https://doi.org/10.1145/2808719.2811429
Gu, Yarong ; Liu, Xianying ; Zhu, Qiang ; Dong, Youchao ; Brown, Charles ; Pramanik, Sakti. / A new method for DNA sequencing error verification and correction via an on-disk index tree. BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, 2015. pp. 503-504
@inproceedings{5321c6829ab14c3cb92812752c2a1074,
title = "A new method for DNA sequencing error verification and correction via an on-disk index tree",
abstract = "Existing sequencing error correction techniques demand large expensive memory space. In this work, we introduce a new disk-based sequencing error correction method to solve the problem. The key idea is to utilize a special on-disk index structure, called the BoND-tree, to store and access a large set of k-mers and their associated metadata on disk. With the BoND-tree, a set of special box queries to retrieve the relevant k-mers and their counts are efficiently processed. A comprehensive voting mechanism is adopted to determine and correct an erroneous base in a genome sequence. Ex-periments demonstrate that the proposed method is quite promising in verifying and correcting sequencing errors in terms of accuracy and scalability. Copyright is held by the author/owner(s).",
keywords = "Bioinformatics, Disk index tree, Sequencing error correction",
author = "Yarong Gu and Xianying Liu and Qiang Zhu and Youchao Dong and Charles Brown and Sakti Pramanik",
year = "2015",
month = "9",
day = "9",
doi = "10.1145/2808719.2811429",
language = "English (US)",
pages = "503--504",
booktitle = "BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - A new method for DNA sequencing error verification and correction via an on-disk index tree

AU - Gu, Yarong

AU - Liu, Xianying

AU - Zhu, Qiang

AU - Dong, Youchao

AU - Brown, Charles

AU - Pramanik, Sakti

PY - 2015/9/9

Y1 - 2015/9/9

N2 - Existing sequencing error correction techniques demand large expensive memory space. In this work, we introduce a new disk-based sequencing error correction method to solve the problem. The key idea is to utilize a special on-disk index structure, called the BoND-tree, to store and access a large set of k-mers and their associated metadata on disk. With the BoND-tree, a set of special box queries to retrieve the relevant k-mers and their counts are efficiently processed. A comprehensive voting mechanism is adopted to determine and correct an erroneous base in a genome sequence. Ex-periments demonstrate that the proposed method is quite promising in verifying and correcting sequencing errors in terms of accuracy and scalability. Copyright is held by the author/owner(s).

AB - Existing sequencing error correction techniques demand large expensive memory space. In this work, we introduce a new disk-based sequencing error correction method to solve the problem. The key idea is to utilize a special on-disk index structure, called the BoND-tree, to store and access a large set of k-mers and their associated metadata on disk. With the BoND-tree, a set of special box queries to retrieve the relevant k-mers and their counts are efficiently processed. A comprehensive voting mechanism is adopted to determine and correct an erroneous base in a genome sequence. Ex-periments demonstrate that the proposed method is quite promising in verifying and correcting sequencing errors in terms of accuracy and scalability. Copyright is held by the author/owner(s).

KW - Bioinformatics

KW - Disk index tree

KW - Sequencing error correction

UR - http://www.scopus.com/inward/record.url?scp=84963595668&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963595668&partnerID=8YFLogxK

U2 - 10.1145/2808719.2811429

DO - 10.1145/2808719.2811429

M3 - Conference contribution

AN - SCOPUS:84963595668

SP - 503

EP - 504

BT - BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

PB - Association for Computing Machinery, Inc

ER -