seq-lang / seq

A high-performance, Pythonic language for bioinformatics

Home Page:https://seq-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dump objects to files

dawe opened this issue · comments

Hi all, I’m about to explore seq language for the first time. What would you suggest as best strategy to dump objects into files so that they are readable by a third pure python program? I would like to fill a matrix of possible kmers out of my files and I would like to avoid text files as they will be quite large. Is picking supported?

Yes. There is an example in the workshop section of the docs. Hopefully that fits your use case.

Hey @dawe, we do support pickling right now (API should be the same as Python), although the pickle files are not yet compatible with Python.

When it comes to k-mers, they will be written in a 2-bit encoded format, so you could potentially read the pickle files in Python and decode them with e.g. struct.unpack(). Let me know if this makes sense and would work in your case.

@arshajii what would be the structure of a pickled dict[Kmer[7], int] then? Anyhow, since in the end I may want to produce something in a matrix format, I wonder if it would be feasible to use pydef/pyimport coupled with h5py and dump my data into hdf5 files.

Sorry for the delay on this. If you want to read the dict[Kmer[7],int] with an external program, I would recommend pickling each item individually:

for k,v in d.items():
    pickle(k, jar)
    pickle(v, jar)

Then you can just read the binary contents, which would be 2 bytes (padded) for the 2-bit encoded k-mer and 8 bytes for the integer. An alternative is to use Python directly as you suggest, although there's no easy conversion for k-mers, so these would probably have to be converted to strings.