hh1985 / sequence2vec

Vectorize the nucleotide sequences, so that the similarity of the sequences can be represented as similarity of vectors.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


kmer2vec is an algorithm providing short seqeuence (k <= 10) embedding strategy, so that reads which share overlapping sequences tend to be "close" in N dimentional space.


Read2Vec is an algorithm providing read-level (n = 150, 250, or 300) embedding strategy, so that reads which share overlapping sequences tend to be "close" in N dimentional space.


contig2vec is an algorithm which clusters and "assemble" reads based on graphical feature of reads and biological evidence (e.g. paired end reads). Segments from the same genome should be "close" to each other.


Vectorize the nucleotide sequences, so that the similarity of the sequences can be represented as similarity of vectors.