youjunl / DNA-Tree-based-Clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DNA Tree based Clustering

Introduction

This project contains code and benchmark results used in my Master thesis "Clustering for DNA Storage". The idea was developed from the tree based clustering algorithm Clover. And this project provides both Python and C++ implementations for our clustering method.

Usage

Clustering

A read file in TXT format is needed for clustering, and each line should be like:

[Index] [Read]

An example of the read file:

1 ATAAGGG
2 AAAAGGG
3 AAAAGGG
4 GGACCTA
5 GGACCTA
...

Run the clustering with command:

python -m clust.main -I [input file] -O [output file]

For example:

# test data
python -m clust.main -I testdata/toClust.txt -O output_file

The clustering result consists of original indexes and the label of cluster assigned,

[Index],[Label of cluster]

An example of clustering result:

1,1
2,1
3,1
4,2
5,2
...

Benchmark

For comparing the clustering result, a TXT file that indeicates accuracte clustering index is need:

[Index],[Label of cluster]

An example of :

1,1
2,1
3,1
4,2
5,2
...

Two different metrics can be computed with commands:

python tools/computeAcc.py [Accurate indexes] [Cluster result 1] [Cluster result 2] ... [Output file]
python tools/computePur.py [Accurate indexes] [Cluster result 1] [Cluster result 2] ... [Output file]

For example:

python tools/computeAcc.py testdata/toClust.txt output_file_1.txt output_file_2.txt CompareResults.txt

About


Languages

Language:Python 83.5%Language:C++ 16.0%Language:CMake 0.5%