originrose / cortex

Machine learning in Clojure

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training on master takes longer than 0.5.0

harold opened this issue · comments

At 0.5.0:

harold@gibson:~/src/cortex$ rm -rf ~/.cortex
harold@gibson:~/src/cortex$ git checkout 00f171f665f2c2778421300d384618728dd454f3
HEAD is now at 00f171f... Release 0.5.0
harold@gibson:~/src/cortex$ cd compute
harold@gibson:~/src/cortex/compute$ time lein test think.compute.nn.train-test

  [snip ...]

Ran 4 tests containing 4 assertions.
0 failures, 0 errors.

real	0m49.554s
user	1m15.276s
sys	3m3.520s

At master

harold@gibson:~/src/cortex$ rm -rf ~/.cortex/
harold@gibson:~/src/cortex$ git checkout master
Previous HEAD position was 00f171f... Release 0.5.0
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
harold@gibson:~/src/cortex$ git pull
Current branch master is up to date.
harold@gibson:~/src/cortex$ time lein test cortex.compute.nn.train-test

  [snip ...]

Ran 4 tests containing 4 assertions.
0 failures, 0 errors.

real	7m50.945s
user	11m48.408s
sys	45m20.232s

GPU exhibits a similar difference, as well.

Well, the first thing to do is make sure the tests are the same. I believe they got refactored and during that step they may have changed to add more data or epochs.

Good thought. They both do 4 epochs (for each of double and float).

Also could be "more data" --- will take some more investigation since what's printed doesn't indicate dataset or batch size.

Hmmm... This looks suspect:
https://github.com/thinktopic/cortex/blob/master/src/cortex/verify/nn/train.clj#L162

It may be verifying against all 10k test observations...

ok, going to make a PR with some potential changes.