ImageNet dogs vs non-dogs dataset

Question

ImageNet dogs vs non-dogs dataset

dylanyd opened this issue 4 years ago · comments

Thanks for your excellent work.

I'm trying to reproduce the experiment result of ImageNet (dogs) vs ImageNet (non-dogs). Could you please describe how do you construct the dataset in detail or post related code?

Sishuo Chen · Answer 1 · Tue Dec 08 2020 14:49:08 GMT+0800 (China Standard Time)

@dylanyd Thanks for your attention. We will public the Dogs dataset along with the construction code soon after getting permission.

Sishuo Chen · Answer 2 · Sat Dec 12 2020 01:14:33 GMT+0800 (China Standard Time)

Hi @dylanyd ~ We have published the dataset in this link,and the following is a detailed description of it.

Sishuo Chen · Answer 3 · Sat Dec 12 2020 01:15:13 GMT+0800 (China Standard Time)

ImageNet-dogs-non-dogs Dataset Description

The following is a detailed description of the construction process of the ImageNet-dogs-non-dogs dataset used in our paper.

Currently, the dataset is available at this link,containing four parts:

Name	Size	Number of Classes	Descrption
dogs50A-train.tar.gz	50000	50(dogs-A)	training set from ImageNet train set
dogs50A-val.tar.gz	50000	50(dogs-A）	test set from ImageNet val set
dogs50B-val.tar.gz	50000	50(dogs-B)	test set from ImageNet val set (InD test set)
non-dogs-val.tar.gz	10000	882(non-dogs)	non-dogs pics from ImageNet val set（OoD test set）

There are 118 classes belonging to dogs in the 1000 categories of the ImageNet dataset. We choose 100 of them to construct ours dogs dataset.The class indexes and corresponding class names can be found at the code block. We name class 0-49 as dogs-A and 50-99 as dogs-B.

To construct the non-dogs dataset(non-dogs-val.tar.gz),we randomly choose 10000 images belonging to the 882 non-dogs classes from the ImagerNet validation set. It was used as the out-of-distribution test set in our paper.

Dogs-50A-train.tar.gz contains 50000 images randomly chosen from ImageNet training set(100 for each class,index 0-49). It was used as the training set for model training in our paper.

Dogs-50A-val.tar.gz contains 10000 images randomly chosen from ImageNet training set(100 for each class,index 0-49). It was used as the evaluation set for model training in our paper(not used for OoD detection experiments).

Dogs-50B-val.tar.gz contains 10000 images randomly chosen from ImageNet training set(100 for each class,index 50-99). It was used as the in-distribution test set in our paper.

class index | ImageNet original class index | class name
0	151	Chihuahua
1	152	Japanese spaniel
2	153	Maltese dog, Maltese terrier, Maltese
3	154	Pekinese, Pekingese, Peke
4	155	Shih-Tzu
5	156	Blenheim spaniel
6	157	papillon
7	158	toy terrier
8	159	Rhodesian ridgeback
9	160	Afghan hound, Afghan
10	161	basset, basset hound
11	162	beagle
12	163	bloodhound, sleuthhound
13	164	bluetick
14	165	black-and-tan coonhound
15	166	Walker hound, Walker foxhound
16	167	English foxhound
17	168	redbone
18	169	borzoi, Russian wolfhound
19	170	Irish wolfhound
20	171	Italian greyhound
21	172	whippet
22	173	Ibizan hound, Ibizan Podenco
23	174	Norwegian elkhound, elkhound
24	175	otterhound, otter hound
25	176	Saluki, gazelle hound
26	177	Scottish deerhound, deerhound
27	178	Weimaraner
28	179	Staffordshire bullterrier, Staffordshire bull terrier
29	180	American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier
30	181	Bedlington terrier
31	182	Border terrier
32	183	Kerry blue terrier
33	184	Irish terrier
34	185	Norfolk terrier
35	186	Norwich terrier
36	187	Yorkshire terrier
37	188	wire-haired fox terrier
38	189	Lakeland terrier
39	190	Sealyham terrier, Sealyham
40	191	Airedale, Airedale terrier
41	192	cairn, cairn terrier
42	193	Australian terrier
43	194	Dandie Dinmont, Dandie Dinmont terrier
44	195	Boston bull, Boston terrier
45	196	miniature schnauzer
46	197	giant schnauzer
47	198	standard schnauzer
48	199	Scotch terrier, Scottish terrier, Scottie
49	200	Tibetan terrier, chrysanthemum dog
50	201	silky terrier, Sydney silky
51	202	soft-coated wheaten terrier
52	203	West Highland white terrier
53	204	Lhasa, Lhasa apso
54	205	flat-coated retriever
55	206	curly-coated retriever
56	207	golden retriever
57	208	Labrador retriever
58	209	Chesapeake Bay retriever
59	210	German short-haired pointer
60	211	vizsla, Hungarian pointer
61	212	English setter
62	213	Irish setter, red setter
63	214	Gordon setter
64	215	Brittany spaniel
65	216	clumber, clumber spaniel
66	217	English springer, English springer spaniel
67	218	Welsh springer spaniel
68	219	cocker spaniel, English cocker spaniel, cocker
69	220	Sussex spaniel
70	221	Irish water spaniel
71	222	kuvasz
72	223	schipperke
73	224	groenendael
74	225	malinois
75	226	briard
76	227	kelpie
77	228	komondor
78	229	Old English sheepdog, bobtail
79	230	Shetland sheepdog, Shetland sheep dog, Shetland
80	231	collie
81	232	Border collie
82	233	Bouvier des Flandres, Bouviers des Flandres
83	234	Rottweiler
84	235	German shepherd, German shepherd dog, German police dog, alsatian
85	236	Doberman, Doberman pinscher
86	237	miniature pinscher
87	238	Greater Swiss Mountain dog
88	239	Bernese mountain dog
89	240	Appenzeller
90	241	EntleBucher
91	242	boxer
92	243	bull mastiff
93	244	Tibetan mastiff
94	245	French bulldog
95	246	Great Dane
96	247	Saint Bernard, St Bernard
97	248	Eskimo dog, husky
98	249	malamute, malemute, Alaskan malamute
99	250	Siberian husky

muskahya · Answer 4 · Fri Aug 20 2021 17:49:04 GMT+0800 (China Standard Time)

Hi @andrehuang @PKUCSS @dylanyd,

Normally, to train dog vs not dog classification, we have not dog images in our training data too. Does this mean, they are also in-distribution since they are involved training? For example, while testing Mahalanobis approach for this dataset, do we only have one mean (dogs) and a covariance or two means (dog and not-dogs) and a covariance? Let me ask about my scenario. I have a classification task as human or not human and I want to integrate OOD setup on top. My training dataset consists of scenes with humans and scenes without humans (not another object but empty mostly). In this setup do I have one in-dist class (human scenes) or two (human scenes and empty scenes)? Or in my scenario should everything other than human scenes such as empty scenes, scenes with cats, dogs, vacuum cleaners... be called ood? If I understood correctly for dog vs not dog dataset, in your approach (FSSD), you only consider dog images as in-dist and the rest is ood.

Thanks in advance.

Haiwen Huang · Answer 5 · Fri Aug 20 2021 17:59:49 GMT+0800 (China Standard Time)

Hi muskaya,

Thanks for the question.

Normally, to train dog vs not dog classification, we have not dog images in our training data too.

This is not true. The classification task, in our case, is to classify different dog breeds (note there are 100 categories of dogs as we listed above).

In your scenario, you should try to only use human scenes as your training data. Your problem seems to be your label is only "human" or "objects" but not have more fine-grained human classes like "old", "woman", etc.

My personal suggestion for you is to consider contrastive training like instance discrimination or SimCLR to make sure you have a classification task on the human only, so that the feature activations would be related to different types of human.

muskahya · Answer 6 · Mon Aug 23 2021 16:58:16 GMT+0800 (China Standard Time)

Hi @andrehuang

Thank you so much for the answer.

Yes, I am simply trying to say human presence or absence (independent of the types). However, in my scenario, I am not using images but radar signal sequences and I already have a pre-trained model (trained by using human scenes and empty scenes). Currently, it is quite impossible for me to re-train the architecture by only using human scenes. In this setup I have two thoughts based on your approach:

Assigning label 1 (iid) only to the human scenes and the rest is label 0 (ood) even if I have ood data (empty scenes) during training
Considering both the human scenes and the empty scenes as in-dist, and the rest is ood.

Do you think, any of these would work in my setup by using your approach?

Thanks in advance!

Haiwen Huang · Answer 7 · Mon Aug 23 2021 17:06:20 GMT+0800 (China Standard Time)

if your "ood" (empty scene) is in the training data, and you don't expect other types of OOD data to appear, say data from other sources, then you can simply use binary classification (like your first approach). This is actually not ood detection per se. And you can directly try using softmax prediction instead of these ood detection approaches, and I think Maha distance to the human class might work as well (single mean and single covariance of the human class).

muskahya · Answer 8 · Mon Aug 23 2021 17:17:30 GMT+0800 (China Standard Time)

Thank you very much for the answer @andrehuang
Yes, the current setup is running in that way and it is mostly the case (human or empty). However, when a cat or a dog or a vacuum cleaner appeared in front of the radar, we want it to say I do not know. I can say that the ood data in our scenario are the other objects which are able to move around.

Haiwen Huang · Answer 9 · Mon Aug 23 2021 17:21:31 GMT+0800 (China Standard Time)

In that case, I think your second approach (which can use any ood detection method) can be used to detect and filter out such real ood data first. Then you could do the human/non-human binary classification.

muskahya · Answer 10 · Mon Aug 23 2021 17:25:39 GMT+0800 (China Standard Time)

@andrehuang thank you very much for your very quick and informative responses :)