RQuispeC / pytorch-ACSCP

Unofficial implementation of "Crowd Counting via Adversarial Cross-Scale Consistency Pursuit" with pytorch - CVPR 2018

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

关于package requirements的版本问题

EillotY opened this issue · comments

image
必须是这样的包嘛,我现在用的包都超过这个版本了(我对于这块内容不是很理解),个人是觉得更高的版本应该兼容低版本的内容,但执行不了;感谢

Hello, this project uses pytorch and not tensorflow. Library versions are:

Python 3.5.2
PyTorch 1.1.0
OpenCV 3.4.0
Numpy 1.14.5
MatPlotLib 2.2.2
Scipy 1.0.0

hope this helps.

thanks,anyway。可以执行了,想问下你关于训练的问题,对于fold1来说,收敛的不是很快,270epoch,bestmae还是eopch:159的mae:519.9 ,我想询问下,是会继续收敛吗,你在训练的时候也是这样的情况吗?还有想询问是fold1训练完成后,然后再对fold2-5再次训练,这样的好处是什么?还有一个错误点:就是code for windows的时候,copy_to_directory函数里:
image

split的时候需要双反斜杠区分文件路径,code for linux的时候 没问题;
感谢!

Hi EillotY,

When you train on this dataset, the results vary a lot depending on the fold you are using for test, for instance, the results I got for each fold were:

  MAE MSE
fold 1 531,64 880,12
fold 2 356,61 558,70
fold 3 134,40 154,99
fold 4 227,12 265,00
fold 5 158,87 219,02
AVERAGE 281,73 415,56

This just show the importance of using K-fold cross validation. This link may be interesting for you:

https://towardsdatascience.com/why-and-how-to-do-cross-validation-for-machine-learning-d5bd7e60c189

Also notice that the authors of the dataset suggest to use this protocol.

I don't think that results will improve more in fold 1 as in my experience that fold always has those kind of values.

About the bug in windows, that's expected as the code was tested only in linux. Anyway thanks for pointing the solution, hope it helps other people using windows as well :)

经过10多天的训练,得到一个不怎么好的结果,没有得到你给出的比论文结果好的mae,mse,比原论文还要差一点,average mae大概在300左右,这是随机发生的吗?因为存在生成对抗网络的原因,生成的密度图(原论文也没有提及说训练多少次Generator,这也是一种原因);还有一个问题:虽然预先准备的时候,分成5折,每一折里面的test_img也是不一样的,但是在训练的时候,每次的val还是每一折最后的test都是编号前10的图片,后续的图片并没有test,查看了代码,还没有查看出错误,或许是这个原因导致我训练出的结果只有300,感谢。

Hi EillotY,

Sorry I was not able to understand your comments, can you rephrase it please?

If it helps, refer to #1 to find the trained models for ucf_cc_50

After more than 10 days of training, I got a not very good result. I didn't get the mae, mse that you gave better than the original result. It is a little worse than the original paper. The average mae is about 300, which is randomly happening. ?
There is another problem: although prepared in advance, divided into 5 folds, each fold the test_img inside is also different, but in the training , each time the last test of each fold is the top 10 picture(1-10), the subsequent pictures(11-50) have no test, the code is checked, and the error has not been found. Perhaps this is the reason why I trained only 300 results, thanks.

About your results, did you use parameter --people-thr 20 when augmenting data? By default it uses --people-thr 0

About the problem with the folds, the script dataset_loader.py creates the train and validation sets:

for fold in range(5):
            fold_dir = osp.join(self.augmented_dir, 'fold{}'.format(fold + 1))
            aug_train_dir_img = osp.join(fold_dir, 'train_img')
            aug_train_dir_den = osp.join(fold_dir, 'train_den')
            aug_train_dir_lab = osp.join(fold_dir, 'train_lab')
            fold_test_dir_img = osp.join(fold_dir, 'test_img')
            fold_test_dir_den = osp.join(fold_dir, 'test_den')
            fold_test_dir_lab = osp.join(fold_dir, 'test_lab')

            mkdir_if_missing(aug_train_dir_img)
            mkdir_if_missing(aug_train_dir_den)
            mkdir_if_missing(aug_train_dir_lab)
            mkdir_if_missing(fold_test_dir_img)
            mkdir_if_missing(fold_test_dir_den)
            mkdir_if_missing(fold_test_dir_lab)
            
            kwargs['name'] = 'ucf-fold{}'.format(fold + 1)
            ##### TRAIN /TEST DIRS ARE SAVED HERE #####
            train_test = train_test_unit(aug_train_dir_img, aug_train_dir_den, fold_test_dir_img, fold_test_dir_den, kwargs.copy())
            self.train_test_set.append(train_test)

            if augment_data:
                ##### HERE WE SPLIT THE DATA#########
                test_img = img_names[fold * 10: (fold + 1) * 10]
                test_ids = img_ids[fold * 10: (fold + 1) * 10]
                test_den_paths = [osp.join(self.ori_dir_den, img_id + 'npy') for img_id in test_ids]
                test_lab_paths = [osp.join(self.ori_dir_lab, img_id + 'json') for img_id in test_ids]
                test_img_paths = [osp.join(self.ori_dir_img, img) for img in test_img]

                train_img = sorted(list(set(img_names) - set(test_img)))
                train_ids = sorted(list(set(img_ids) - set(test_ids)))
                train_den_paths = [osp.join(self.ori_dir_den, img_id + 'npy') for img_id in train_ids]
                train_lab_paths = [osp.join(self.ori_dir_lab, img_id + 'json') for img_id in train_ids]
                train_img_paths = [osp.join(self.ori_dir_img, img) for img in train_img]

                #augment train data
                print("Augmenting {}".format(kwargs['name']))
                augment(train_img_paths, train_lab_paths, train_den_paths, aug_train_dir_img, aug_train_dir_lab, aug_train_dir_den, slide_window_params, noise_params, light_params)
                copy_to_directory(test_den_paths, fold_test_dir_den)
                copy_to_directory(test_lab_paths, fold_test_dir_lab)
                copy_to_directory(test_img_paths, fold_test_dir_img)

It is expected that each test_img will be different, again, because of k-fold cross validation.

EillotY

I understand your question better now, UCF-CC-50 dataset has 50 images and the validation protocol is 5-folds cross validation, this means that:

-fold 1: images 1-10 for test and remaining for train
-fold 2: images 11-20 for test and remaining for train
-fold 3: images 21-30 for test and remaining for train
-fold 4: images 31-40 for test and remaining for train
-fold 5: images 41-50 for test and remaining for train

You can further read why this protocol is important here.

About the parameter --people-thr NUMBER, it is a threshold (e.g. NUMBER) that indicates the minimum number of people that each training patch has, in general while bigger this value, less amount of training data and training time, and in some cases it can improve the results of the network [1]. Please note that by default --people-thr is setup to 0 and it not may always improve results.

For the results reported I used --people-thr 20 because at that time I already had the trained data generated for that threshold.

[1] http://www.liv.ic.unicamp.br/~quispe/papers/crowd_counting_iwssip.pdf

soory,i finally see what is happening in there .thank you very much.
and i find a difference below:
image
image
G_larger,encoder_8:(3,3,64) but the paper in 8 has (4,4,64;decoder_8:(6,6,1) but the paper in 16 has (6,6,3)
G_small has the same difference .
So why the difference? Were that your changes?

I pointed out these differences at the end of the README.
I implemented it in that way because of some problems with the tensor's size when doing the encoder/decoder. I don't know which framework the original authors used, but when using pytorch and applying 4x4 conv in the last step, the output of the encoder was too small, not pretty sure but I remember it vanished. There are other differences as well, I explained all of them at the end of the README.
I don't consider all these too relevant as the results obtained by my implementation are comparable with the reported by the authors.