nitishsrivastava / deepnet

Implementation of some deep learning algorithms.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

reproduce multimodal dbm result

xcszbdnl opened this issue · comments

Hello, everyone.
I'm trying to reproduce multimodal dbm result. However, @nitishsrivastava didn't give the example of multimodal dbm, only gave a example of multimodal dbn.
So, I have wrriten the running scripts, used the model files he gives at
[http://www.cs.toronto.edu/~nitish/multimodal/] and mofied some bugs in it. For example, the deepnet.proto do not have the parameter "mcmc_steps", it has been changed to "mf_steps"...
However, the model couldn't reproduce the result as nitish gives on his paper, maybe there is still some bugs in it. I have debugged for a few weeks and can not fix it.
So, is there anyone who can cooperate with me to fix it? Then it can be merge into master's branch to help others reproduce mutlimodal dbm result.
I have forked the code and start a new branch at multimodal_dbm_example_branch

Hello @xcszbdnl.
What kind of changes did you do to produce your multimodal DBM example? It looks like you may have copied the multimodal DBN code and changed it to use the files @nitishsrivastava provided for multimodal DBMs (and fixed a few bugs, as you said); is that correct? Also, what errors are you getting? Is the only known error that the results are not as good as those in the paper?

I have a few thoughts on why the results might not be as good. 1st, @nitishsrivastava may have done some fine tuning of hyperparameters on his deepnet model that is not reflected in the code he provided and which gives better results. 2nd, the training (i.e. runall_dbm.sh) may have to be modified more thoroughly. From my understanding, one of the big differences between DBNs and DBMs is the training procedures. DBNs are trained as a stack of RBMs, I believe, completely training each RBM one at a time before moving to the next in the stack. DBMs, however, train more fluidly, as a unit, so that the training of any given layer can affect the training of the other layers, both above and below it. Perhaps by analyzing the differences between deepnet's DBN code and its DBM code, we can find out the way we need to create the runall_dbm.sh to reproduce the results in the paper.

I have made the following changes:

  1. In dbn trainings, DBNs are trained as a stack of RBMs. DBM use pretrained RBM as a initialization. So to train a multimodal dbm, the first steps are pretraining RBMs, just like multimodal_dbn did, only change some hyperparameters, like up_factor or down_factor and so on.
  2. DBM should be trained as a complete model. Therefore, in addition to pretraining, I add DBM training. This is to train the whole DBM model. The lines 126-128 did this.
  3. After training DBM model, we should extract features from the DBM model. In multimodal dbn, it only use pretrained RBM hidden features. However in DBM, we should use the features which extracted from DBM, Lines 128-130 did this. But @nitishsrivastava did not provide feature extraction from dbm, I wrote extract_dbm_representation.py to extract features from dbm.
  4. To sample text, we should use DBM model, not dbn model.
    Then classification is the same as multimodal dbn.

I didn't get any errors. All training procedure looks fine.
I just can not reproduce the multimodal result. After I extract the representation from DBM. Hidden Layer 1 in image get 0.42, but Hidden Layer 2 in image just get 0.16, and joint layer just get 0.12(like random results).(20000 training steps in my case. nitish use 2000000, but it seems that not training step affect this.I just want to produce result like 0.4x or 0.5x, then I can use 2000000 training steps)

Hi, xcszbdnl

Your project excite me a lot.
Do you have still low precision by using DBM model?
If you could have solution on it, please share it.

Thanks