[Bug]: OCR notebook crashes
dnth opened this issue · comments
Dickson Neoh commented
What happened?
I ran the OCR notebook and it crashed with the following error.
UnicodeDecodeError Traceback (most recent call last)
[<ipython-input-5-dbb82ee0fc02>](https://localhost:8080/#) in <cell line: 2>()
1 fd = fastdup.create(input_dir='./frames')
----> 2 fd.run(bounding_box='ocr', num_images=300)
15 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 105369: invalid start byte
What did you expect to see?
Run complete.
What version of fastdup were you runnning on?
1.31
What version of Python were you running on?
Python 3.10
Operating System
Google Colab
Reproduction steps
Run this Colab notebook -
https://colab.research.google.com/drive/1XvRkN4tCcW3K9J4UlUBIqm8Z2orJfFvp?usp=sharing
Relevant log output
Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
fastdup C++ info received: 2023-07-31 07:12:38 [INFO] Going to loop over dir frames
2023-07-31 07:12:38 [INFO] Found total 300 images to run on, 300 train, 0 test, name list 300, counter 300
2023-07-31 07:12:59 [WARNING] Failed to find bounding box in 1.0,402.0,52.0,402.0,52.0,464.0,1.0,464.0,米,0.5845534801483154 -1543484592 stof
2023-07-31 07:12:59 [WARNING] Failed to find bounding box in 21.0,749.0,549.0,584.0,562.0,627.0,35.0,792.0,Nee het ging niet in 1keer,0.8404790163040161 -1543486816 stof
2023-07-31 07:13:11 [WARNING] Failed to find bounding box in 429.0,987.0,562.0,989.0,562.0,1007.0,429.0,1005.0,@marc.koolen,0.9491152763366699 -1543479200 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 80.0,129.0,235.0,129.0,235.0,144.0,80.0,144.0,"Replytomvo97scomment",0.9465373754501343 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 78.0,149.0,304.0,149.0,304.0,167.0,78.0,167.0,"Someonesgetting that",0.8849286437034607 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 80.0,174.0,206.0,174.0,206.0,195.0,80.0,195.0,hormone gut,0.8970210552215576 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 62.0,222.0,328.0,222.0,328.0,247.0,62.0,247.0,People who get all of,0.8900323510169983 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 29.0,265.0,295.0,265.0,295.0,289.0,29.0,289.0,their knowledgefrom,0.9228520393371582 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 26.0,306.0,185.0,304.0,185.0,331.0,27.0,333.0,social media,0.9554478526115417 -1677714224 stof
2023-07-31 07:13:32 [WARNING] Failed to find bounding box in 472.0,983.0,563.0,987.0,562.0,1009.0,471.0,1006.0,@caitlinjs,0.9428111910820007 -1677718464 stof
2023-07-31 07:13:40 [WARNING] Failed to find bounding box in 29.0,319.0,507.0,231.0,515.0,274.0,37.0,362.0,Wanneer je terug komt,0.9444238543510437 -1543473408 stof
2023-07-31 07:13:52 [WARNING] Failed to find bounding box in 78.0,134.0,271.0,134.0,271.0,153.0,78.0,153.0,"Replytojessca4765scomment",0.9421570897102356 -1543484432 stof
2023-07-31 07:13:52 [WARNING] Failed to find bounding box in 74.0,161.0,320.0,161.0,320.0,179.0,74.0,179.0,whydoesthe metalplates,0.9307484030723572 -1543452848 stof
2023-07-31 07:13:52 [WARNING] Failed to find bounding box in 79.0,186.0,314.0,186.0,314.0,204.0,79.0,204.0,feelheavierthanthenom,0.9704270362854004 -1543452848 stof
2023-07-31 07:13:52 [WARNING] Failed to find bounding box in 78.0,210.0,201.0,210.0,201.0,228.0,78.0,228.0,metalplates?,0.9400894641876221 -1543452848 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 79.0,137.0,271.0,137.0,271.0,151.0,79.0,151.0,"Replytojessca4765scomment",0.9017161130905151 -1675931072 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 76.0,161.0,320.0,161.0,320.0,179.0,76.0,179.0,whydoesthemetal plates,0.912723183631897 -1677718464 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 79.0,186.0,314.0,186.0,314.0,204.0,79.0,204.0,feel heavierthanthenom,0.9386289119720459 -1677718464 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 79.0,212.0,200.0,212.0,200.0,227.0,79.0,227.0,metalplates?,0.9113922715187073 -1677718464 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 377.0,987.0,564.0,989.0,564.0,1008.0,377.0,1006.0,@timmytimmadome,0.9204213619232178 -1677718464 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 79.0,137.0,270.0,137.0,270.0,151.0,79.0,151.0,"Replytojessca4765scomment",0.9417281746864319 -1543485952 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 74.0,161.0,320.0,161.0,320.0,179.0,74.0,179.0,whydoesthemetalplates,0.9604387879371643 -1543479344 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 80.0,186.0,314.0,186.0,314.0,204.0,80.0,204.0,feelheavierthanthenom,0.9448582530021667 -1543479344 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 79.0,210.0,201.0,210.0,201.0,228.0,79.0,228.0,metalplates?,0.917719304561615 -1543479344 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 376.0,986.0,563.0,987.0,563.0,1008.0,376.0,1007.0,@timmytimmadome,0.9878508448600769 -1543479344 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 79.0,137.0,270.0,137.0,270.0,151.0,79.0,151.0,"Replytojessca4765scomment",0.9305316209793091 -1677701520 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 76.0,161.0,320.0,161.0,320.0,179.0,76.0,179.0,whydoesthemetal plates,0.903861939907074 -1677708480 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 80.0,186.0,314.0,186.0,314.0,204.0,80.0,204.0,feelheavierthanthenom,0.9454621076583862 -1677708480 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 79.0,212.0,200.0,212.0,200.0,227.0,79.0,227.0,metalplates?,0.9111447930335999 -1677708480 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 376.0,986.0,563.0,987.0,563.0,1008.0,376.0,1007.0,@tim
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/fastdup/sentry.py", line 132, in inner_function
ret = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fastdup/fastdup_controller.py", line 533, in run
if fastdup.run(self._set_fastdup_input(), work_dir=str(self._work_dir), **fastdup_kwargs) != 0:
File "/usr/local/lib/python3.10/dist-packages/fastdup/__init__.py", line 679, in run
out_df = pd.read_csv(local_file)[[
File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 1442, in __init__
self._engine = self._make_engine(f, self.engine)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 1753, in _make_engine
return mapping[engine](f, **self.options)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/c_parser_wrapper.py", line 79, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 547, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 636, in pandas._libs.parsers.TextReader._get_header
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1965, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 105369: invalid start byte
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-5-dbb82ee0fc02> in <cell line: 2>()
1 fd = fastdup.create(input_dir='./frames')
----> 2 fd.run(bounding_box='ocr', num_images=300)
15 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 105369: invalid start byte
Attach a screenshot [Optional]
No response
Contact Details [Optional]
No response
Danny Bickson commented
Fixed in 1.32