jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to use utility function kmer2seq

ryao-mdanderson opened this issue · comments

Dear DNABERT author:

May you please kindly provide a python example how to call kmer2seq to convert a text file (for example examples/sample_data/pre/6_3k.txt) to its original sequence?

Thank you very much,
Rong

Hi Rong,

Hope this may find you well.

def seq2kmer(seq, k):
kmer = [seq[x:x+k] for x in range(len(seq)+1-k)]
kmers = "\n".join(kmer)
return kmers

file_object = open('test.txt', 'a')

from Bio import SeqIO

for record in SeqIO.parse("/path/to/.fasta", "fasta"):
seq=str(record.seq)
kmers=seq2kmer(seq,k)
file_object.write(kmers + "\n")

file_object.close()

Best regards,
Chao

@alexwu66666 Hello Chao,

I really appreciate your help!

Hello I just tried the attached code based on the sequence but output the test.txt file under default folder with empty content. Can you advise?

import os
import pandas as pd
import numpy as np

def seq2kmer(seq, k):
kmer = [seq[x:x+k] for x in range(len(seq)+1-k)]
kmers = "\n".join(kmer)
return kmers
from Bio import SeqIO
file_object = open('test2.txt', 'a')
for record in SeqIO.parse("C:/Users/sfang/tert.fasta", "fasta"):
kmers=seq2kmer(seq,9)
file_object.write(kmers + "\n")
print(kmers)
file_object.close()

@alexwu66666 Hello Chao,

You can close this issue. Thank you for providing an example script.