miurahr / py7zr

7zip in python3 with ZStandard, PPMd, LZMA2, LZMA1, Delta, BCJ, BZip2, and Deflate compressions, and AES encryption.

Home Page:https://pypi.org/project/py7zr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

py7zr aborts with error when decompressing PPMd-compressed 7-Zip archives

tausendsassa opened this issue · comments

py7zr aborts with error when decompressing PPMd-compressed 7-Zip archives.

Code:
import py7zr
archive = py7zr.SevenZipFile("testdata-x5-ppmd.7z", mode='r')
archive.extractall()
archive.close()

Error:
Traceback (most recent call last):
File "censored\test.py", line 941, in
archive.extractall()
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\py7zr.py", line 959, in extractall
self._extract(path=path, return_dict=False, callback=callback)
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\py7zr.py", line 621, in _extract
self.worker.extract(
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\py7zr.py", line 1181, in extract
self.extract_single(
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\py7zr.py", line 1263, in extract_single
raise e
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\py7zr.py", line 1260, in extract_single
self._extract_single(fp, files, src_end, q, skip_notarget)
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\py7zr.py", line 1320, in _extract_single
crc32 = self.decompress(fp, f.folder, obfp, f.uncompressed, f.compressed, src_end)
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\py7zr.py", line 1369, in decompress
tmp = decompressor.decompress(fp, out_remaining)
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\compressor.py", line 657, in decompress
tmp = self._decompress(data, max_length)
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\compressor.py", line 611, in _decompress
data = decompressor.decompress(data, max_length) # always give max_length for lzma1
File "censored\AppData\Roaming\Python\Python310\site-packages\py7zr\compressor.py", line 295, in decompress
return self.decoder.flush(max_length)
ValueError: Decompression failed.

Environment:
Windows 10
Python 3.9.5 and 3.10.2
py7zr 0.17.4

Test data:
testdata-x5-ppmd.7z
testdata-x7-ppmd.7z
testdata-x9-ppmd.7z
The attached example archives were created with 7-Zip 21.07 command line tool with parameters "-m0=PPMd -mx=5" / "-m0=PPMd -mx=7" / "-m0=PPMd -mx=9".
testdata.zip

Hints:
The larger the packed archives are, the more files are unpacked until the error occurs. The higher the compression level of the 7-Zip archives is set, the fewer files are unpacked before aborting (this probably corresponds to the file size). For larger 7-Zip archivews consisting of multiple blocks, a few files at the beginning of each block are unpacked before aborting.

This is caused by both py7zr and pyppmd bugs.
compressed data is under a size of buffer, so all input data is given to py7zr then decompress without data from second and later files. py7zr calls flush when second file then decompression of 3rd file will be failed.

In same situation, it is not succeeded when pyppmd got all input data and extract multiple output. It seems become infinite loop.

@tausendsassa Thank you for report and sorry against the feature is unusable. I'd like to recommend you to call p7zip command to extract from python.

A ValueError is raised in PyPPMd library, that means internal status become unexpected situation.
PyPPMd, that is written by C/C++, requires a big change to fix the issue.

@tausendsassa could you try #420 ?