miurahr / pyppmd

pyppmd provides classes and functions for compressing and decompressing text data, using PPM (Prediction by partial matching) compression algorithm variation H and I.2. It provide an API similar to Python's zlib/bz2/lzma modules.

Home Page:https://pyppmd.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PPMd8 decompress generates data size just one-byte less than expected

miurahr opened this issue · comments

Describe the bug
PPMd8 decompression, with restore_method == CUTOFF, on python 3.8 on Windows sometimes generate one-byte smaller than expected

To Reproduce
https://github.com/miurahr/pyppmd/runs/3322119476?check_suite_focus=true

Environment (please complete the following information):

  • OS: Windows 10, Linux
  • Python: CPython 3.7
  • project version: v0.16.0

Additional context

2021-08-13T12:53:50.1591006Z tests/test_ppmd8.py::test_ppmd8_encode_decode[1048576-1] FAILED
2021-08-13T12:53:50.1591880Z 
2021-08-13T12:53:50.1592431Z ================================== FAILURES ===================================
2021-08-13T12:53:50.1593036Z _____________________ test_ppmd8_encode_decode[1048576-1] _____________________
2021-08-13T12:53:50.1593498Z 
2021-08-13T12:53:50.1594801Z tmp_path = WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-unknown/pytest-0/test_ppmd8_encode_decode_104851')
2021-08-13T12:53:50.1596160Z mem_size = 1048576, restore_method = 1
2021-08-13T12:53:50.1597050Z 
2021-08-13T12:53:50.1598047Z     @pytest.mark.parametrize(
2021-08-13T12:53:50.1598706Z         "mem_size, restore_method",
2021-08-13T12:53:50.1599210Z         [
2021-08-13T12:53:50.1599804Z             (8 << 20, pyppmd.PPMD8_RESTORE_METHOD_RESTART),
2021-08-13T12:53:50.1600803Z             (8 << 20, pyppmd.PPMD8_RESTORE_METHOD_CUT_OFF),
2021-08-13T12:53:50.1601474Z             (1 << 20, pyppmd.PPMD8_RESTORE_METHOD_RESTART),
2021-08-13T12:53:50.1602274Z             (1 << 20, pyppmd.PPMD8_RESTORE_METHOD_CUT_OFF),
2021-08-13T12:53:50.1602817Z         ],
2021-08-13T12:53:50.1603210Z     )
2021-08-13T12:53:50.1603705Z     @pytest.mark.timeout(20)
2021-08-13T12:53:50.1604464Z     def test_ppmd8_encode_decode(tmp_path, mem_size, restore_method):
2021-08-13T12:53:50.1605133Z         length = 0
2021-08-13T12:53:50.1605742Z         m = hashlib.sha256()
2021-08-13T12:53:50.1606433Z         with testdata_path.joinpath("10000SalesRecords.csv").open("rb") as f:
2021-08-13T12:53:50.1607227Z             with tmp_path.joinpath("target.ppmd").open("wb") as target:
2021-08-13T12:53:50.1608142Z                 enc = pyppmd.Ppmd8Encoder(6, mem_size, restore_method=restore_method, endmark=True)
2021-08-13T12:53:50.1608921Z                 data = f.read(READ_BLOCKSIZE)
2021-08-13T12:53:50.1609443Z                 while len(data) > 0:
2021-08-13T12:53:50.1609930Z                     m.update(data)
2021-08-13T12:53:50.1610522Z                     length += len(data)
2021-08-13T12:53:50.1611141Z                     target.write(enc.encode(data))
2021-08-13T12:53:50.1611786Z                     data = f.read(READ_BLOCKSIZE)
2021-08-13T12:53:50.1612414Z                 target.write(enc.flush())
2021-08-13T12:53:50.1613540Z         shash = m.digest()
2021-08-13T12:53:50.1614660Z         m2 = hashlib.sha256()
2021-08-13T12:53:50.1615351Z         assert length == 1237262
2021-08-13T12:53:50.1615937Z         length = 0
2021-08-13T12:53:50.1616543Z         with tmp_path.joinpath("target.ppmd").open("rb") as target:
2021-08-13T12:53:50.1617283Z             with tmp_path.joinpath("target.csv").open("wb") as out:
2021-08-13T12:53:50.1618171Z                 dec = pyppmd.Ppmd8Decoder(6, mem_size, restore_method=restore_method, endmark=True)
2021-08-13T12:53:50.1619007Z                 data = target.read(READ_BLOCKSIZE)
2021-08-13T12:53:50.1619588Z                 while len(data) > 0 or not dec.eof:
2021-08-13T12:53:50.1620140Z                     res = dec.decode(data)
2021-08-13T12:53:50.1620632Z                     m2.update(res)
2021-08-13T12:53:50.1621116Z                     out.write(res)
2021-08-13T12:53:50.1621603Z                     length += len(res)
2021-08-13T12:53:50.1622161Z                     data = target.read(READ_BLOCKSIZE)
2021-08-13T12:53:50.1622711Z >       assert length == 1237262
2021-08-13T12:53:50.1623150Z E       assert 1237261 == 1237262
2021-08-13T12:53:50.1623560Z E         +1237261
2021-08-13T12:53:50.1623925Z E         -1237262
2021-08-13T12:53:50.1624222Z 
2021-08-13T12:53:50.1624726Z tests\test_ppmd8.py:105: AssertionError

Hopefully #54 fix the issue here.

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days