OverflowError when scanning certain HF repos

Question

OverflowError when scanning certain HF repos

csmizzle opened this issue a year ago · comments

Describe the bug
When using the modelscan -hf cmd from the cli, response.read() in /modelscan/tools/utils.py throws a OverFlowError. Simple fix here would be to catch and read the file in chunks.

something like

...
    return response.read()
except OverflowError:
    chunk = 16 * 1024
    file_size = response.length
    data = bytearray()
    while len(data) != file_size:
        chunk = response.read(chunk)
        if not chunk:
            break
        data.extend(chunk)
    return bytes(data)
...

To Reproduce
Steps to reproduce the behavior:

Use arguments '-hf'
With model 'stabilityai/stable-diffusion-xl-base-1.0'
See error

(modelscan-py3.9) ➜  modelscan git:(main) modelscan -hf stabilityai/stable-diffusion-xl-base-1.0
Exception: signed integer is greater than maximum
Traceback (most recent call last):
  File "/Users/csmizzle/Desktop/werk/modelscan/modelscan/cli.py", line 72, in cli
    modelscan.scan_huggingface_model(huggingface)
  File "/Users/csmizzle/Desktop/werk/modelscan/modelscan/modelscan.py", line 76, in scan_huggingface_model
    data = io.BytesIO(_http_get(url))
  File "/Users/csmizzle/Desktop/werk/modelscan/modelscan/tools/utils.py", line 110, in _http_get
    return _http_get(response.headers["Location"])
  File "/Users/csmizzle/Desktop/werk/modelscan/modelscan/tools/utils.py", line 115, in _http_get
    return response.read()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 472, in read
    s = self._safe_read(self.length)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 613, in _safe_read
    data = self.fp.read(amt)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
OverflowError: signed integer is greater than maximum

Expected behavior
Stream large file using a chunking technique

Environment (please complete the following information):

mac OS 13.4.1 (22F82)
Modelscan Version 0.0.0

chris · Answer 1 · Sun Aug 20 2023 23:36:47 GMT+0800 (China Standard Time)

happy to take this on.

chris · Answer 2 · Mon Aug 21 2023 09:29:01 GMT+0800 (China Standard Time)

Above fix leads to this. Magic number check is failing for pytorch models. Working with other scans. Will continue to test.

(modelscan-py3.9) ➜  modelscan git:(43-overflowerror) ✗ modelscan -hf stabilityai/stable-diffusion-xl-base-1.0
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/text_encoder/openvino_model.bin using pytorch model scan
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/text_encoder_2/openvino_model.bin using pytorch model scan
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/openvino_model.bin using pytorch model scan
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae_decoder/openvino_model.bin using pytorch model scan
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae_encoder/openvino_model.bin using pytorch model scan

--- Summary ---

 No issues found! 🎉

--- Errors --- 

Error 1:
The following error was raised during a pytorch scan: 
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/text_encoder/openvino_model.bin

Error 2:
The following error was raised during a pytorch scan: 
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/text_encoder_2/openvino_model.bin

Error 3:
The following error was raised during a pytorch scan: 
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/openvino_model.bin

Error 4:
The following error was raised during a pytorch scan: 
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae_decoder/openvino_model.bin

Error 5:
The following error was raised during a pytorch scan: 
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae_encoder/openvino_model.bin

Faisal Khan · Answer 3 · Mon Aug 21 2023 22:58:44 GMT+0800 (China Standard Time)

@csmizzle Thanks for raising the issue and detailed error report. There is a PR (#39) under review for fixing HuggingFace (HF) model downloads. It replaces the http library with the requests to fetch the models. The requests library takes care of lot of the issues when downloading HF models including URL escaping, redirects, and overflow. I ran the PR against stabilityai/stable-diffusion-xl-base-1.0 model and there was no overflow issue. However, the invalid magic number seems to be an unrelated problem.

If you are happy with the solution in #39, we can close this issue once the PR is approved and merged. The invalid magic number problem can get its own issue.

chris · Answer 4 · Mon Aug 21 2023 23:39:30 GMT+0800 (China Standard Time)

@iamfaisalkhan awesome, thanks for the update. nice work in #39. will close this up! magic number error likely goes away with the proposed work in #39. will avoid opening new issue for now.