visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: OCR notebook crashes

dnth opened this issue · comments

What happened?

I ran the OCR notebook and it crashed with the following error.

UnicodeDecodeError                        Traceback (most recent call last)
[<ipython-input-5-dbb82ee0fc02>](https://localhost:8080/#) in <cell line: 2>()
      1 fd = fastdup.create(input_dir='./frames')
----> 2 fd.run(bounding_box='ocr', num_images=300)

15 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 105369: invalid start byte

What did you expect to see?

Run complete.

What version of fastdup were you runnning on?

1.31

What version of Python were you running on?

Python 3.10

Operating System

Google Colab

Reproduction steps

Run this Colab notebook -
https://colab.research.google.com/drive/1XvRkN4tCcW3K9J4UlUBIqm8Z2orJfFvp?usp=sharing

Relevant log output

Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
fastdup C++ info received:  2023-07-31 07:12:38 [INFO] Going to loop over dir frames
2023-07-31 07:12:38 [INFO] Found total 300 images to run on, 300 train, 0 test, name list 300, counter 300 
2023-07-31 07:12:59 [WARNING] Failed to find bounding box in 1.0,402.0,52.0,402.0,52.0,464.0,1.0,464.0,米,0.5845534801483154 -1543484592 stof
2023-07-31 07:12:59 [WARNING] Failed to find bounding box in 21.0,749.0,549.0,584.0,562.0,627.0,35.0,792.0,Nee het ging niet in 1keer,0.8404790163040161 -1543486816 stof
2023-07-31 07:13:11 [WARNING] Failed to find bounding box in 429.0,987.0,562.0,989.0,562.0,1007.0,429.0,1005.0,@marc.koolen,0.9491152763366699 -1543479200 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 80.0,129.0,235.0,129.0,235.0,144.0,80.0,144.0,"Replytomvo97scomment",0.9465373754501343 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 78.0,149.0,304.0,149.0,304.0,167.0,78.0,167.0,"Someonesgetting that",0.8849286437034607 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 80.0,174.0,206.0,174.0,206.0,195.0,80.0,195.0,hormone gut,0.8970210552215576 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 62.0,222.0,328.0,222.0,328.0,247.0,62.0,247.0,People who get all of,0.8900323510169983 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 29.0,265.0,295.0,265.0,295.0,289.0,29.0,289.0,their knowledgefrom,0.9228520393371582 -1677714224 stof
2023-07-31 07:13:21 [WARNING] Failed to find bounding box in 26.0,306.0,185.0,304.0,185.0,331.0,27.0,333.0,social media,0.9554478526115417 -1677714224 stof
2023-07-31 07:13:32 [WARNING] Failed to find bounding box in 472.0,983.0,563.0,987.0,562.0,1009.0,471.0,1006.0,@caitlinjs,0.9428111910820007 -1677718464 stof
2023-07-31 07:13:40 [WARNING] Failed to find bounding box in 29.0,319.0,507.0,231.0,515.0,274.0,37.0,362.0,Wanneer je terug komt,0.9444238543510437 -1543473408 stof
2023-07-31 07:13:52 [WARNING] Failed to find bounding box in 78.0,134.0,271.0,134.0,271.0,153.0,78.0,153.0,"Replytojessca4765scomment",0.9421570897102356 -1543484432 stof
2023-07-31 07:13:52 [WARNING] Failed to find bounding box in 74.0,161.0,320.0,161.0,320.0,179.0,74.0,179.0,whydoesthe metalplates,0.9307484030723572 -1543452848 stof
2023-07-31 07:13:52 [WARNING] Failed to find bounding box in 79.0,186.0,314.0,186.0,314.0,204.0,79.0,204.0,feelheavierthanthenom,0.9704270362854004 -1543452848 stof
2023-07-31 07:13:52 [WARNING] Failed to find bounding box in 78.0,210.0,201.0,210.0,201.0,228.0,78.0,228.0,metalplates?,0.9400894641876221 -1543452848 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 79.0,137.0,271.0,137.0,271.0,151.0,79.0,151.0,"Replytojessca4765scomment",0.9017161130905151 -1675931072 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 76.0,161.0,320.0,161.0,320.0,179.0,76.0,179.0,whydoesthemetal plates,0.912723183631897 -1677718464 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 79.0,186.0,314.0,186.0,314.0,204.0,79.0,204.0,feel heavierthanthenom,0.9386289119720459 -1677718464 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 79.0,212.0,200.0,212.0,200.0,227.0,79.0,227.0,metalplates?,0.9113922715187073 -1677718464 stof
2023-07-31 07:13:54 [WARNING] Failed to find bounding box in 377.0,987.0,564.0,989.0,564.0,1008.0,377.0,1006.0,@timmytimmadome,0.9204213619232178 -1677718464 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 79.0,137.0,270.0,137.0,270.0,151.0,79.0,151.0,"Replytojessca4765scomment",0.9417281746864319 -1543485952 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 74.0,161.0,320.0,161.0,320.0,179.0,74.0,179.0,whydoesthemetalplates,0.9604387879371643 -1543479344 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 80.0,186.0,314.0,186.0,314.0,204.0,80.0,204.0,feelheavierthanthenom,0.9448582530021667 -1543479344 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 79.0,210.0,201.0,210.0,201.0,228.0,79.0,228.0,metalplates?,0.917719304561615 -1543479344 stof
2023-07-31 07:14:05 [WARNING] Failed to find bounding box in 376.0,986.0,563.0,987.0,563.0,1008.0,376.0,1007.0,@timmytimmadome,0.9878508448600769 -1543479344 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 79.0,137.0,270.0,137.0,270.0,151.0,79.0,151.0,"Replytojessca4765scomment",0.9305316209793091 -1677701520 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 76.0,161.0,320.0,161.0,320.0,179.0,76.0,179.0,whydoesthemetal plates,0.903861939907074 -1677708480 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 80.0,186.0,314.0,186.0,314.0,204.0,80.0,204.0,feelheavierthanthenom,0.9454621076583862 -1677708480 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 79.0,212.0,200.0,212.0,200.0,227.0,79.0,227.0,metalplates?,0.9111447930335999 -1677708480 stof
2023-07-31 07:14:06 [WARNING] Failed to find bounding box in 376.0,986.0,563.0,987.0,563.0,1008.0,376.0,1007.0,@tim 

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/fastdup/sentry.py", line 132, in inner_function
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/fastdup/fastdup_controller.py", line 533, in run
    if fastdup.run(self._set_fastdup_input(), work_dir=str(self._work_dir), **fastdup_kwargs) != 0:
  File "/usr/local/lib/python3.10/dist-packages/fastdup/__init__.py", line 679, in run
    out_df = pd.read_csv(local_file)[[
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 605, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 1753, in _make_engine
    return mapping[engine](f, **self.options)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/c_parser_wrapper.py", line 79, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 547, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 636, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1965, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 105369: invalid start byte
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-dbb82ee0fc02> in <cell line: 2>()
      1 fd = fastdup.create(input_dir='./frames')
----> 2 fd.run(bounding_box='ocr', num_images=300)

15 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 105369: invalid start byte

Attach a screenshot [Optional]

No response

Contact Details [Optional]

No response

Fixed in 1.32