prince-ph0en1x / QAGS

Quantum Accelerated Genome Sequencing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

QAGS

Quantum Accelerated Genome Sequencing

Info

This repository hosts accompanying codes for Aritra Sarkar's Master thesis in Computer Engineering at Delft University of Technology.

The research was conducted in the period 01-11-2017 to 22-06-2018 at the Quantum Computer Architecture lab under the supervision of Prof. Dr. Koen Bertels, Dr. Carmen G. Almudever and Dr. Zaid Al-Ars.

The QCA lab is affiliated to QuTech and the Quantum & Computer Engineering department, as a collaboration between the Faculty of Applied Sciences and the Faculty of Electrical Engineering, Mathematics and Computer Science.

Thesis: Quantum Algorithms for pattern-matching in genomic sequences

Brief

Fast sequencing and analysis of (microorganism, plant or human) genomes will open up new vistas in fields like personalised medication, food yield and epigenetic research. Current state-of-the-art DNA pattern matching techniques use heuristic algorithms on computing clusters of CPUs, GPUs and FPGAs. With genomic data set to eclipse social and astronomical big data streams within a decade, the alternate computing paradigm of quantum computation is explored to accelerate genome-sequence reconstruction. The inherent parallelism of quantum superposition of states is harnessed to design a quantum kernel for accelerating the search process. The project explores the merger of these two domains and identifies ways to fit these together to design a genome-sequence analysis pipeline with quantum algorithmic speedup. The design of a genome-sequence analysis pipeline with a quantum kernel is tested with a proof-of-concept demonstration using a quantum simulator.

With small-scale quantum processors transitioning from experimental physics labs to industrial products, these processors allow us to efficiently compute important algorithms in various fields. In this research, we propose a quantum algorithm to address the challenging field of big data processing for genome sequence analysis. We describes an architecture-aware implementation of a quantum algorithm for sub-sequence alignment. A new algorithm named QiMAM (quantum indexed multi-associative memory) is proposed, that uses approximate pattern-matching based on Hamming distances. QiMAM extends the Grover's search algorithm in two ways to allow for: (1) approximate matches needed for read errors in genomics, and (2) a distributed search for multiple solutions over quantum encoding of DNA sequences. This approach gives a quadratic speedup over the classical algorithm. A full implementation of the algorithm is provided and verified using the OpenQL compiler and QX simulator framework. This represents a first exploration towards a full-stack quantum accelerated genome sequencing pipeline design.

Algorithms

The codes are written in Python, using the OpenQL quantum compiler and Qxelerator simulator encapsulator for QX quantum simulator. The MATLAB/Octave helper codes are derived from Qubiter CSD to help in decomposing an arbitrary unitary into quantum unitary gates. Work is in progress in integrating unitary decomposition as part of OpenQL.

About

Quantum Accelerated Genome Sequencing

License:Apache License 2.0


Languages

Language:Python 58.8%Language:MATLAB 41.1%Language:Shell 0.1%