Tencent / tencent-ml-images

Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[bug] duplicate image names can cause overwrite problem

linrongc opened this issue · comments

In the download script, saving images with path "save_dir + im_name" will overwrite any images with same name.

For example:
http://i.ytimg.com/vi/6rMwgpPSJyU/3.jpg 8486:1 8479:1 8473:1 5175:1 5170:1 1042:1 865:1 2:1

http://web.mit.edu/admissions/blogs/photos/jenny-whitesox/3.jpg 10591:1 1914:1 1897:1 1829:1 1054:1 1041:1 865:1 2:1

http://bp2.blogger.com/_u3lFqBksmrE/Rgoqe1STw-I/AAAAAAAACKI/sl1nY4Q4RAc/s400/3.jpg 9199:1 9170:1 8585:1 5177:1 5170:1 1042:1 865:1 2:1
....

they have the same image name.

@linrongc Thanks for this feedback. We will try to fix it asap.

using line_num as save name with mapping file will be an easy fix.

这是一个巨坑

@linrongc good suggestion.

@linrongc @wwfnwg I have updated Line 27,28 of 'download_urls_multithreading.py'. Sorry for this bug, if any other bug, please let me know. Thanks.

@linrongc @wwfnwg I have updated Line 27,28 of 'download_urls_multithreading.py'. Sorry for this bug, if any other bug, please let me know. Thanks.

还是有问题,有些整个url都是相同的,标签却不相同

@linrongc @wwfnwg I have updated Line 27,28 of 'download_urls_multithreading.py'. Sorry for this bug, if any other bug, please let me know. Thanks.

还是有问题,有些整个url都是相同的,标签却不相同

@wwfnwg Please see the updated README about the repeated URLs.