tiny_imagenet_dataset - wrong labels in validation set
statist-bhfz opened this issue · comments
Validation dataloader from the example tinyimagenet-alexnet always returns 1
:
library(torch)
library(torchvision)
dir <- "./Downloads/tiny-imagenet"
device <- if(cuda_is_available()) "cuda" else "cpu"
to_device <- function(x, device) {
x$to(device = device)
}
valid_ds <- tiny_imagenet_dataset(
dir,
download = TRUE,
split = "val",
transform = function(x) {
x %>%
transform_to_tensor() %>%
to_device(device) %>%
transform_resize(c(64,64))
}
)
valid_dl <- dataloader(valid_ds, batch_size = 64, shuffle = FALSE, drop_last = TRUE)
dataloader_next(dataloader_make_iter(valid_dl))[[2]]
torch_tensor
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
... [the output was truncated (use n=-1 to disable)]
[ CPULongType{64} ]
> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=Russian_Ukraine.utf8
[2] LC_CTYPE=Russian_Ukraine.utf8
[3] LC_MONETARY=Russian_Ukraine.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=Russian_Ukraine.utf8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] reprex_2.0.2 torchvision_0.5.1 torch_0.10.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 rstudioapi_0.14 knitr_1.42
[4] magrittr_2.0.3 bit_4.0.5 R6_2.5.1
[7] jpeg_0.1-10 rlang_1.1.0 fastmap_1.1.1
[10] fansi_1.0.4 tools_4.2.3 xfun_0.38
[13] coro_1.0.3 utf8_1.2.3 cli_3.6.1
[16] clipr_0.8.0 withr_2.5.0 htmltools_0.5.5
[19] yaml_2.3.7 digest_0.6.31 bit64_4.0.5
[22] tibble_3.2.1 lifecycle_1.0.3 processx_3.8.1
[25] callr_3.7.3 vctrs_0.6.2 fs_1.6.1
[28] ps_1.7.5 evaluate_0.20 glue_1.6.2
[31] rmarkdown_2.21 compiler_4.2.3 pillar_1.9.0
[34] pkgconfig_2.0.3
The same thing happens when iterating along test dataset, only train set contains correct labels.
I think that's because the validation set is not shuffled (as per dataloader creation).
When you look at the complete set, you should get all 200 labels:
> valid_dl <- dataloader(valid_ds, batch_size = 10000, shuffle = FALSE, drop_last = TRUE)
> labels <- dataloader_next(dataloader_make_iter(valid_dl))[[2]]
> unique(as.numeric(labels))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[20] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
[39] 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
[58] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
[77] 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
[96] 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
[115] 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
[134] 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
[153] 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
[172] 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190
[191] 191 192 193 194 195 196 197 198 199 200
Does this work for you?
@skeydan yes, it works, my mistake was to create new iterator in all sequential calls of dataloader_next()
.
But I still can't understand why example https://torchvision.mlverse.org/articles/examples/tinyimagenet-alexnet.html doesn't work:
[epoch 1]: Loss = 5.299575, Acc= 0.025040
[epoch 2]: Loss = 5.299323, Acc= 0.025040
[epoch 3]: Loss = 5.299293, Acc= 0.025040
[epoch 4]: Loss = 5.299235, Acc= 0.025040
With model_resnet18()
it looks something better but also confusing:
[epoch 1]: Loss = 4.568783, Acc= 0.064203
[epoch 2]: Loss = 3.636775, Acc= 0.065905
[epoch 3]: Loss = 3.179632, Acc= 0.079127
[epoch 4]: Loss = 2.820852, Acc= 0.080429
[epoch 5]: Loss = 2.472266, Acc= 0.087740
[epoch 6]: Loss = 2.094413, Acc= 0.088442
[epoch 7]: Loss = 1.662074, Acc= 0.084135
[epoch 8]: Loss = 1.191983, Acc= 0.078125
[epoch 9]: Loss = 0.746314, Acc= 0.083333
[epoch 10]: Loss = 0.440392, Acc= 0.082833
[epoch 11]: Loss = 0.281548, Acc= 0.075020
pred <- [torch_topk](https://rdrr.io/pkg/torch/man/torch_topk.html)(pred, k = 5, dim = 2, TRUE, TRUE)[[2]]$add(1L)
should be
pred <- [torch_topk](https://rdrr.io/pkg/torch/man/torch_topk.html)(pred, k = 5, dim = 2, TRUE, TRUE)[[2]]
Now model predicts right labels, but alexnet with optim_adam
still doesn't learn anything at least during first several epochs while resnet18 has top5-accuracy 0.37 after first epoch. optim_adagrad(model$parameters, lr = 0.005)
works much better.
# alexnet with optim_adam(model$parameters)
[epoch 1]: Loss = 5.299603, Acc= 0.025040
[epoch 2]: Loss = 5.299326, Acc= 0.025040
# alexnet with optim_adagrad(model$parameters, lr = 0.005)
[epoch 1]: Loss = 7.493342, Acc= 0.112881
[epoch 2]: Loss = 4.772662, Acc= 0.175381
Hi @statist-bhfz,
I can confirm your improvements work better than the original setting!
[epoch 1]: Loss = 6.557040, Acc= 0.105569
[epoch 2]: Loss = 4.818115, Acc= 0.162760
[epoch 3]: Loss = 4.612732, Acc= 0.209936
[epoch 4]: Loss = 4.457637, Acc= 0.241587
[epoch 5]: Loss = 4.362794, Acc= 0.255909
Merging your PR, thanks for the contribution!