Cadene / vqa.pytorch

Visual Question Answering in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Slow data loading?

davidgolub opened this issue · comments

First of all, thank you for open-sourcing your code! It is very useful for my research.

I notice that data loading is quite slow, i.e., out of the total time for a batch (10s, 13s, 5s), data-loading takes about 80% of the time (8s, 11s, 4s) for most batches. Consequently I can only get about 4 epochs in per a day. I use 4 workers and the default options in the repository. Do you have any recommendations on how to speed that part up?

Thanks,
David

commented

Hi,

Thank you for your interest.
I suppose that your are training models with attention. Those models need to load features of size 14x14x2048xbatch_size which can be challenging.
That is why we trained our models with multiple SSD raid0 or SSD Pcie.
You can try to locate your bottleneck using htop and atop. It could come from your threads or the i/o times.

Please, let me know.

Great, thanks for your feedback!

Also another question--I noticed that when training the model for VQA 2.0 (and if I remember correctly, VQA 1.0 as well) with the default parameters on trainval, the accuracy peaks at around 60%. I assume you have a lot of experience tweaking the hyperparameters in the repo--do you have any intuition why this may be happening? I.e., I increased nans from 2k to 3k, but is it that? Or too much dropout? Any thoughts? The main issue seems to be with questions in the "other" category.

commented

Sorry for the late answer. Are you talking about the training accuracy ?
Did you already solved your problem ?