关于package requirements的版本问题
EillotY opened this issue · comments
Hello, this project uses pytorch and not tensorflow. Library versions are:
Python 3.5.2
PyTorch 1.1.0
OpenCV 3.4.0
Numpy 1.14.5
MatPlotLib 2.2.2
Scipy 1.0.0
hope this helps.
Hi EillotY,
When you train on this dataset, the results vary a lot depending on the fold you are using for test, for instance, the results I got for each fold were:
MAE | MSE | |
---|---|---|
fold 1 | 531,64 | 880,12 |
fold 2 | 356,61 | 558,70 |
fold 3 | 134,40 | 154,99 |
fold 4 | 227,12 | 265,00 |
fold 5 | 158,87 | 219,02 |
AVERAGE | 281,73 | 415,56 |
This just show the importance of using K-fold cross validation. This link may be interesting for you:
https://towardsdatascience.com/why-and-how-to-do-cross-validation-for-machine-learning-d5bd7e60c189
Also notice that the authors of the dataset suggest to use this protocol.
I don't think that results will improve more in fold 1 as in my experience that fold always has those kind of values.
About the bug in windows, that's expected as the code was tested only in linux. Anyway thanks for pointing the solution, hope it helps other people using windows as well :)
经过10多天的训练,得到一个不怎么好的结果,没有得到你给出的比论文结果好的mae,mse,比原论文还要差一点,average mae大概在300左右,这是随机发生的吗?因为存在生成对抗网络的原因,生成的密度图(原论文也没有提及说训练多少次Generator,这也是一种原因);还有一个问题:虽然预先准备的时候,分成5折,每一折里面的test_img也是不一样的,但是在训练的时候,每次的val还是每一折最后的test都是编号前10的图片,后续的图片并没有test,查看了代码,还没有查看出错误,或许是这个原因导致我训练出的结果只有300,感谢。
Hi EillotY,
Sorry I was not able to understand your comments, can you rephrase it please?
If it helps, refer to #1 to find the trained models for ucf_cc_50
After more than 10 days of training, I got a not very good result. I didn't get the mae, mse that you gave better than the original result. It is a little worse than the original paper. The average mae is about 300, which is randomly happening. ?
There is another problem: although prepared in advance, divided into 5 folds, each fold the test_img inside is also different, but in the training , each time the last test of each fold is the top 10 picture(1-10), the subsequent pictures(11-50) have no test, the code is checked, and the error has not been found. Perhaps this is the reason why I trained only 300 results, thanks.
About your results, did you use parameter --people-thr 20
when augmenting data? By default it uses --people-thr 0
About the problem with the folds, the script dataset_loader.py creates the train and validation sets:
for fold in range(5):
fold_dir = osp.join(self.augmented_dir, 'fold{}'.format(fold + 1))
aug_train_dir_img = osp.join(fold_dir, 'train_img')
aug_train_dir_den = osp.join(fold_dir, 'train_den')
aug_train_dir_lab = osp.join(fold_dir, 'train_lab')
fold_test_dir_img = osp.join(fold_dir, 'test_img')
fold_test_dir_den = osp.join(fold_dir, 'test_den')
fold_test_dir_lab = osp.join(fold_dir, 'test_lab')
mkdir_if_missing(aug_train_dir_img)
mkdir_if_missing(aug_train_dir_den)
mkdir_if_missing(aug_train_dir_lab)
mkdir_if_missing(fold_test_dir_img)
mkdir_if_missing(fold_test_dir_den)
mkdir_if_missing(fold_test_dir_lab)
kwargs['name'] = 'ucf-fold{}'.format(fold + 1)
##### TRAIN /TEST DIRS ARE SAVED HERE #####
train_test = train_test_unit(aug_train_dir_img, aug_train_dir_den, fold_test_dir_img, fold_test_dir_den, kwargs.copy())
self.train_test_set.append(train_test)
if augment_data:
##### HERE WE SPLIT THE DATA#########
test_img = img_names[fold * 10: (fold + 1) * 10]
test_ids = img_ids[fold * 10: (fold + 1) * 10]
test_den_paths = [osp.join(self.ori_dir_den, img_id + 'npy') for img_id in test_ids]
test_lab_paths = [osp.join(self.ori_dir_lab, img_id + 'json') for img_id in test_ids]
test_img_paths = [osp.join(self.ori_dir_img, img) for img in test_img]
train_img = sorted(list(set(img_names) - set(test_img)))
train_ids = sorted(list(set(img_ids) - set(test_ids)))
train_den_paths = [osp.join(self.ori_dir_den, img_id + 'npy') for img_id in train_ids]
train_lab_paths = [osp.join(self.ori_dir_lab, img_id + 'json') for img_id in train_ids]
train_img_paths = [osp.join(self.ori_dir_img, img) for img in train_img]
#augment train data
print("Augmenting {}".format(kwargs['name']))
augment(train_img_paths, train_lab_paths, train_den_paths, aug_train_dir_img, aug_train_dir_lab, aug_train_dir_den, slide_window_params, noise_params, light_params)
copy_to_directory(test_den_paths, fold_test_dir_den)
copy_to_directory(test_lab_paths, fold_test_dir_lab)
copy_to_directory(test_img_paths, fold_test_dir_img)
It is expected that each test_img will be different, again, because of k-fold cross validation.
EillotY
I understand your question better now, UCF-CC-50 dataset has 50 images and the validation protocol is 5-folds cross validation, this means that:
-fold 1: images 1-10 for test and remaining for train
-fold 2: images 11-20 for test and remaining for train
-fold 3: images 21-30 for test and remaining for train
-fold 4: images 31-40 for test and remaining for train
-fold 5: images 41-50 for test and remaining for train
You can further read why this protocol is important here.
About the parameter --people-thr NUMBER
, it is a threshold (e.g. NUMBER
) that indicates the minimum number of people that each training patch has, in general while bigger this value, less amount of training data and training time, and in some cases it can improve the results of the network [1]. Please note that by default --people-thr
is setup to 0 and it not may always improve results.
For the results reported I used --people-thr 20
because at that time I already had the trained data generated for that threshold.
[1] http://www.liv.ic.unicamp.br/~quispe/papers/crowd_counting_iwssip.pdf
I pointed out these differences at the end of the README.
I implemented it in that way because of some problems with the tensor's size when doing the encoder/decoder. I don't know which framework the original authors used, but when using pytorch and applying 4x4 conv in the last step, the output of the encoder was too small, not pretty sure but I remember it vanished. There are other differences as well, I explained all of them at the end of the README.
I don't consider all these too relevant as the results obtained by my implementation are comparable with the reported by the authors.