Sense-GVT / DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Filter YFCC data

Hxyou opened this issue · comments

Hi, thanks for the great work. After downloading the provided YFCC15M label file, I can see there are three keys caption filename url in each one of the labels. how should we find the corresponding YFCC image according to your label? i.e., which key should we use to align with YFCC data?

You can use the url as key , and filename for check

The image name of YFCC data seems to be a md5 encoding. I'm also a little confused about how to make a connection.

I am also trying to filter YFCC and I have the same issue. The dataset I have downloaded has a very different structure, and I don't know how to find the images based on the filename that you provide. Also I am not sure about what you mean by "Prepare the YFCC15M subset metadata pickle by the label".

My version of YFCC100M looks exactly the same as the one they have in the SLIP repo. Do you organise the data in a different way?