visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]:AssertionError: For removing wrong labels created by the create_similarity_gallery() need to run stats_file=df where df is the output of create_similarity_gallery()

atmadeep opened this issue · comments

What happened?

Creating similarity Gallery:

df_1 = fastdup.create_similarity_gallery(similarity_file = "./fastdup_report/train/similarity.csv",
                                         save_path = "./fastdup_report/train/save_images",
                                         get_label_func=get_label,
                                         get_bounding_box_func=return_bbox,
                                         num_images = 100)

Removing the similar items:

fastdup.delete_or_retag_stats_outliers(stats_file=df_1,
                                       work_dir="./fastdup_report/train/",
                                       metric='score',
                                       lower_threshold=51, 
                                       dry_run=True)

Origin of Error:

file fastdup/__init__.py

if isinstance(stats_file, pd.DataFrame):
            assert isinstance(work_dir, str) and os.path.exists(work_dir), "When providing pandas dataframe need to set work_dir to point to fastdup work_dir"
            df = stats_file
        else:
            df = load_stats(stats_file, work_dir, {})
        if metric == "score" and metric not in df.columns:
            assert False, "For removing wrong labels created by the create_similarity_gallery() need to run stats_file=df where df is the output of create_similarity_gallery()"

What did you expect to see?

I expected all the similar images to be removed.

What version of fastdup were you runnning on?

1.111

What version of Python were you running on?

Python 3.8

Operating System

Ubuntu 20.04

Reproduction steps

  1. Create a similarity gallery and store the output in a dataframe. fastdup.create_similarity_gallery()
  2. Use the dataframe for removing all the similar images. fastdup.delete_or_retag_stats_outliers()

Relevant log output

Traceback (most recent call last):
  File "/home/$USER/miniconda3/envs/env_4/lib/python3.8/site-packages/fastdup/__init__.py", line 1800, in delete_or_retag_stats_outliers
    assert False, "For removing wrong labels created by the create_similarity_gallery() need to run stats_file=df where df is the output of create_similarity_gallery()"
AssertionError: For removing wrong labels created by the create_similarity_gallery() need to run stats_file=df where df is the output of create_similarity_gallery()


### Attach a screenshot [Optional]

_No response_

### Contact Details [Optional]

aatmadeepaarya@gmail.com

Hi @atmadeep please try to run the similarity gallery with slice='label_score' and let us know if this works for you.