kracwarlock / action-recognition-visual-attention

Action recognition using soft attention based deep recurrent neural networks

Home Page:http://www.cs.toronto.edu/~shikhar/projects/action-recognition-attention

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Your time for one 128-batch?

thanhnguyentang opened this issue · comments

Hi @kracwarlock ,

This is my first time to train a net using Theano. I wonder if my setup was wrong when it takes so long even it prints out that my GPU is used. I train networks in Caffe it is much faster. Do you remember roughly how many seconds does it take for one 128-image batch in your training? It takes me about 60 second for 1 batch.

Thank you.

I am so sorry that I mention something which is not related to your question. I found that only you are active on this project these days. I want to ask a question.
Have you extract feature successfully using the script from author?
Thanks a lot! I just begin to do this project. Thanks

Hi @cloudandcat ,

Yes. I am training it now. The author's code is the base codes based on which I wrote some other scripts to prepare the data.
I am active because I adapt this code and idea for my course project. I just realized it is slow because of the for loop iterated over every batch EACH time it trains batch. It it can be solved by loading all batches before performing the training.

Hey hi

I don't remember the times unfortunately but yes the provided data handler is slow. I used to load all batches at once into memory (or as many would fit) and that was very fast.

Hi @thanh-ng, @kracwarlock

Here is the output of my test (including the execution params), I'm using a GPU 980Ti and 16Gb RAM,

sudo THEANO_FLAGS='floatX=float32,device=gpu,mode=FAST_RUN,nvcc.fastmath=True' python -m scripts.evaluate_ucf11
Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled, CuDNN not available)
GPU Lock Acquired
Anything printed here will end up in the output directory for job #0

{'decay_c': [1e-05], 'patience': [10], 'n_layers_init': [1], 'dim_out': [512], 'max_epochs': [3], 'dispFreq': [20], 'validFreq': [100], 'temperature_inverse': [1], 'reload': [False], 'n_layers_att': [1], 'fps': [30], 'ctx_dim': [1024], 'valid_batch_size': [128], 'n_actions': [11], 'training_stride': [1], 'optimizer': ['adam'], 'alpha_c': [0.0], 'dictionary': [None], 'learning_rate': [0.0001], 'batch_size': [128], 'selector': [False], 'last_n': [30], 'dataset': ['ucf11'], 'ctx2out': [False], 'dim': [512], 'use_dropout': [True], 'testing_stride': [1], 'n_layers_out': [1], 'maxlen': [30], 'model': ['model_ucf11.npz'], 'saveFreq': [100]}

Booting up all data handlers
Dataset size 70370
Dataset size 70370
Dataset size 86981
Dataset size 58612
Data handlers ready

Building model

Optimization

Epoch 0
Epoch 0 Update 20 Cost 2366.36303711 PD 3.90953922272 UD 1.55029010773
Epoch 0 Update 40 Cost 897.009155273 PD 3.23121905327 UD 1.48270797729
Epoch 0 Update 60 Cost 294.29006958 PD 2.84509396553 UD 1.40387296677
Epoch 0 Update 80 Cost 216.007827759 PD 2.76406693459 UD 1.41400504112
Epoch 0 Update 100 Cost 62.8925018311 PD 2.7870850563 UD 1.43792915344
Saving... Done

Hi @kracwarlock , @GerardoHH
Sorry for replying it late. Thank you for your replies. Just in case someone is still active in this project and is willing to improve data loading time, what I did was that I still used loop, but instead of loading data every loop, I loop to compute indices only and then slicing the data based on these indices outside of the loop.

Bests,
-Thanh