Ppmd8_DecodeSymbol should be received by int, not unsigned char. Then stop decoding if <0
cielavenir opened this issue · comments
Describe the bug
https://github.com/miurahr/pyppmd/blob/v0.15.2/src/ext/_ppmdmodule.c#L1451 Ppmd8_DecodeSymbol should be received by int, not unsigned char
if Ppmd8_DecodeSymbol returns <0, decoding should terminate immediately. and char -1 and int -1 are different (this is similar to that fgetc() should be received by int, not char).
it will help:
- to identify eof when -1 is returned
- to identify data error when -2 is returned
Additional context
- you can determine eof from Ppmd8_DecodeSymbol return-value; do you really need "end marker"?
- as you will rewrite buffering, I'm glad if you take a look at this.
Additional context 2
Actually I wrote ppmd handler for zipfile but additional end-marker creates incompatible stream.
Could you propose the change for your proposal over #33 (when merged)?
It is OK to drop "end marker".
It is in https://github.com/cielavenir/pyppmd/commits/UseEndmarkProperly2 , but see #33 (comment) first.
Note: An implementation of "end mark" is partially compatible with unrar does.
#33 changed to do it. (don't touch end mark code)
Firstly, both 7z and rar uses PPMdH and zip uses PPMdI.
Then actually ppmd has quite many dialects - although 7z and rar uses PPMdH, the rangecoders are different. It is also different from the original PPMd to unpack PPMd archive format described in http://www.compression.ru/ds/ .
So it is free to have different endmark handling [edit: across softwares].
Maybe adding option to set the endmark handling is an idea.
Now you can pass endmark option to encoder and decoder.
ref: #39
@miurahr I confirmed the compatibility of current pyppmd and I was able to release https://pypi.org/project/zipfile-ppmd/ as zipfile module patcher, which works the same way as zipfile-zstd. Thank you.