Training on master takes longer than 0.5.0
harold opened this issue · comments
At 0.5.0
:
harold@gibson:~/src/cortex$ rm -rf ~/.cortex
harold@gibson:~/src/cortex$ git checkout 00f171f665f2c2778421300d384618728dd454f3
HEAD is now at 00f171f... Release 0.5.0
harold@gibson:~/src/cortex$ cd compute
harold@gibson:~/src/cortex/compute$ time lein test think.compute.nn.train-test
[snip ...]
Ran 4 tests containing 4 assertions.
0 failures, 0 errors.
real 0m49.554s
user 1m15.276s
sys 3m3.520s
At master
harold@gibson:~/src/cortex$ rm -rf ~/.cortex/
harold@gibson:~/src/cortex$ git checkout master
Previous HEAD position was 00f171f... Release 0.5.0
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
harold@gibson:~/src/cortex$ git pull
Current branch master is up to date.
harold@gibson:~/src/cortex$ time lein test cortex.compute.nn.train-test
[snip ...]
Ran 4 tests containing 4 assertions.
0 failures, 0 errors.
real 7m50.945s
user 11m48.408s
sys 45m20.232s
GPU exhibits a similar difference, as well.
Well, the first thing to do is make sure the tests are the same. I believe they got refactored and during that step they may have changed to add more data or epochs.
Good thought. They both do 4 epochs (for each of double
and float
).
Also could be "more data" --- will take some more investigation since what's printed doesn't indicate dataset or batch size.
Hmmm... This looks suspect:
https://github.com/thinktopic/cortex/blob/master/src/cortex/verify/nn/train.clj#L162
It may be verifying against all 10k test observations...
Maybe 0.5.0 was only training against 100 observations?
https://github.com/thinktopic/cortex/blob/00f171f665f2c2778421300d384618728dd454f3/cortex/src/cortex/verify/nn/train.clj#L153
ok, going to make a PR with some potential changes.