Abstract
The vast increase in DNA sequencing capacity over the last decade has quickly turned biology into a dataintensive science. Nevertheless, current sequencers such as Illumia HiSeq have high random per-base error rates, which makes sequencing error correction an indispensable requirement for many sequence analysis applications. Most existing error correction methods demand large expensive memory space, which limits their scalability for handling large datasets. In this paper, we present a new disk based method, called DiskBQcor, for sequencing error correction. DiskBQcor stores k-mers of sequencing genome data along with their associated metadata on inexpensive disk and utilizes a disk based index tree to efficiently process special box queries to obtain relevant k-mers and their occurring frequencies. It then applies a comprehensive voting mechanism and possibly an efficient binary encoding based assembly technique to verify and correct an erroneous base in a genome sequence under various conditions. Our experiments demonstrate that the proposed method is quite promising in error verification and correction for sequencing genome data on disk.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016 |
Publisher | The International Society for Computers and Their Applications (ISCA) |
Pages | 69-76 |
Number of pages | 8 |
ISBN (Electronic) | 9781943436033 |
State | Published - Jan 1 2016 |
Externally published | Yes |
Event | 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016 - Las Vegas, United States Duration: Apr 4 2016 → Apr 6 2016 |
Other
Other | 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016 |
---|---|
Country | United States |
City | Las Vegas |
Period | 4/4/16 → 4/6/16 |
Keywords
- Algorithm
- Box query
- DNA sequencing
- Error correction
- Index tree
ASJC Scopus subject areas
- Artificial Intelligence
- Computational Theory and Mathematics
- Information Systems
- Biomedical Engineering
- Electrical and Electronic Engineering
- Health Informatics