miurahr / pyppmd

pyppmd provides classes and functions for compressing and decompressing text data, using PPM (Prediction by partial matching) compression algorithm variation H and I.2. It provide an API similar to Python's zlib/bz2/lzma modules.

Home Page:https://pyppmd.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About API

opened this issue · comments

If there are some variants of PPMd, could consider an unified API suitable for these.

PPMd was designed for text compression, so pyppmd can provide some text-related APIs, such as encoding/decoding, line break, etc.

If need help, you may @ me.

Yes, PPMd is designed for text then you are right. I'm wandering what is pythonic way for that.
I've also interested in supporting PPMd variants not only ver.H also ver.I that used in RAR.

Ppmd7Encoder, Ppmd7Deocder is actually a low level API. We can define high level API that accept encoding, line-break and PPMd version.

Need to be very familiar with PPMd and Python, and stand on the user's perspective.
Designing the API may be the most challenging part of the work.

The API should return source data size when it convert encoding. Decoder need an exact size because PPMd decoder designs to produce more data even when input data is exhausted.

Because Ppmd8 now uses end mark In v0.14.0 and later. decompressor can know a length of output data.
So compressor API does not need to return raw data size when string is passed and encoded to UTF-8, it just return compressed data.

An data format of end mark is as same and subset of RAR compression.

For unified API, there are more spaces to be able to improve.

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days