Your time for one 128-batch?

Question

Your time for one 128-batch?

thanhnguyentang opened this issue 8 years ago · comments

This is my first time to train a net using Theano. I wonder if my setup was wrong when it takes so long even it prints out that my GPU is used. I train networks in Caffe it is much faster. Do you remember roughly how many seconds does it take for one 128-image batch in your training? It takes me about 60 second for 1 batch.

Thank you.

cloudandcat · Answer 1 · Mon Nov 28 2016 14:42:57 GMT+0800 (China Standard Time)

I am so sorry that I mention something which is not related to your question. I found that only you are active on this project these days. I want to ask a question.
Have you extract feature successfully using the script from author?
Thanks a lot! I just begin to do this project. Thanks

thanhnt · Answer 2 · Mon Nov 28 2016 16:39:33 GMT+0800 (China Standard Time)

Hi @cloudandcat ,

Yes. I am training it now. The author's code is the base codes based on which I wrote some other scripts to prepare the data.
I am active because I adapt this code and idea for my course project. I just realized it is slow because of the for loop iterated over every batch EACH time it trains batch. It it can be solved by loading all batches before performing the training.

Shikhar Sharma · Answer 3 · Mon Nov 28 2016 23:28:34 GMT+0800 (China Standard Time)

Hey hi

I don't remember the times unfortunately but yes the provided data handler is slow. I used to load all batches at once into memory (or as many would fit) and that was very fast.

GerardoHH · Answer 4 · Tue Nov 29 2016 01:51:47 GMT+0800 (China Standard Time)

Hi @thanh-ng, @kracwarlock

Here is the output of my test (including the execution params), I'm using a GPU 980Ti and 16Gb RAM,

sudo THEANO_FLAGS='floatX=float32,device=gpu,mode=FAST_RUN,nvcc.fastmath=True' python -m scripts.evaluate_ucf11
Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled, CuDNN not available)
GPU Lock Acquired
Anything printed here will end up in the output directory for job #0

{'decay_c': [1e-05], 'patience': [10], 'n_layers_init': [1], 'dim_out': [512], 'max_epochs': [3], 'dispFreq': [20], 'validFreq': [100], 'temperature_inverse': [1], 'reload': [False], 'n_layers_att': [1], 'fps': [30], 'ctx_dim': [1024], 'valid_batch_size': [128], 'n_actions': [11], 'training_stride': [1], 'optimizer': ['adam'], 'alpha_c': [0.0], 'dictionary': [None], 'learning_rate': [0.0001], 'batch_size': [128], 'selector': [False], 'last_n': [30], 'dataset': ['ucf11'], 'ctx2out': [False], 'dim': [512], 'use_dropout': [True], 'testing_stride': [1], 'n_layers_out': [1], 'maxlen': [30], 'model': ['model_ucf11.npz'], 'saveFreq': [100]}

Booting up all data handlers
Dataset size 70370
Dataset size 70370
Dataset size 86981
Dataset size 58612
Data handlers ready

Building model

Optimization

Epoch 0
Epoch 0 Update 20 Cost 2366.36303711 PD 3.90953922272 UD 1.55029010773
Epoch 0 Update 40 Cost 897.009155273 PD 3.23121905327 UD 1.48270797729
Epoch 0 Update 60 Cost 294.29006958 PD 2.84509396553 UD 1.40387296677
Epoch 0 Update 80 Cost 216.007827759 PD 2.76406693459 UD 1.41400504112
Epoch 0 Update 100 Cost 62.8925018311 PD 2.7870850563 UD 1.43792915344
Saving... Done

thanhnt · Answer 5 · Wed Dec 28 2016 14:27:53 GMT+0800 (China Standard Time)

Hi @kracwarlock , @GerardoHH
Sorry for replying it late. Thank you for your replies. Just in case someone is still active in this project and is willing to improve data loading time, what I did was that I still used loop, but instead of loading data every loop, I loop to compute indices only and then slicing the data based on these indices outside of the loop.

Bests,
-Thanh

Your time for one 128-batch?

Booting up all data handlers Dataset size 70370 Dataset size 70370 Dataset size 86981 Dataset size 58612 Data handlers ready

Booting up all data handlers
Dataset size 70370
Dataset size 70370
Dataset size 86981
Dataset size 58612
Data handlers ready