Not able to reproduce reported results for adaptive O-CNN classification and autoencoder
jeichelbaum opened this issue · comments
EDIT: managed to reproduce classification accuracy and autoencoder chamfer distance as reported in paper
I am able to reliable reproduce the results for the full O-CNN classification experiment, but not for the adaptive O-CNN classification and autoencoder experiment. I gathered the results you reported and the results I was able to reproduce in a table below:
depth 5 | depth 6 | depth 7 | |
---|---|---|---|
Reported AO-CNN classification accuracy | 90.5% | 90.4% | 90.0% |
Caffe AO-CNN accuracy | 85.12% | 85.68% | 86.18% |
Tensorflow AO-CNN accuracy | 83.38% | 83.72% | 84.31% |
Reported autoencoder avg. Chamfer dist. | - | - | 1.44 |
Reported autoencoder avg. Chamfer dist. | - | - | 1.77 |
Classification experiment
I am using the ModelNet40 point dataset you provide and followed the instructions exactly as you posted them in docs/classification. I used both the Caffe and Tensorflow implementation to test this, but didn't get accuracy close to your results. The model I used is the same you uploaded in caffe/experiments/aocnn_m40_5.prototxt and tensorflow/script/configs/cls_octree.yaml I attached a visualization of the Caffe model for convenience:
Going through the Tensorflow code spawned some question:
- Does the Tensorflow classifier implementation support adaptive octrees as input?
- Did you use different parameters to construct the octrees other than those mentioned in caffe/experiments/dataset.py?
- What signal size did you use in the adaptive O-CNN classifier experiment? The config files use signal_size=3 (just normal), but from your paper I get the idea that you might have used signal_size = 4 (normal + displacement). I already tried the experiment with both signal_sizes but without success.
Autoencoder experiment
Again I used the ShapeNet point cloud you provided in combination with the provided caffe/experiments/ae_7_4.train.prototxt and executed each step as you describe it in the docs section, but my results are still way off.
My big question is:
Do you have any pointers for me, how I can get closer to reproducing your experiments? Or would you be so kind to share the trained models for both the Caffe adaptive OCNN classifier and autoencoder?
Thanks in advance
-
For the adaptive o-cnn, the basic component is already implemented in our Tensorflow-based implementation. However, I have not tested the adaptive o-cnn on tensorflow.
-
I have updated the
dataset.py
and fixed one parameter. Please pull the latest code and have a try. With depth-5 adaptive o-cnn, the classification accuracy is 89.4% before voting. After using the voting strategies, such as orientation pooling as mentioned in our paper, the performance will increase to about 90.5%. I have just run the experiment on my own PC today. The trained log and the last caffemodel can be downloaded here. Other results will be released soon. -
In the classification experiments, we use only the first 3 channals. According to my own experiments, with 4-channel signals, the testing accuracy may drop about 0.2, probably due to overfitting. In the autoencoder experiments, 4-channel signals are used.
I implemented voting to see if I am able to reproduce the accuracy for the adaptive O-CNN classification experiment. I am seeing some improvements after your last commit, but I am still not able to completely reproduce your results. I am on the latest version of the O-CNN repository and did a clean installation of Ubuntu and Caffe.
depth 5 | depth 6 | depth 7 | |
---|---|---|---|
Reported AO-CNN classification accuracy with orientation pooling | 90.5% | 90.4% | 90.0% |
Reproduced Caffe AO-CNN classification accuracy with average voting | 89.7% | 89.2% | 89.1% |
Reproduced Caffe AO-CNN classification accuracy without voting | 89.1% | 88.9% | 88.6% |
I tested the model you uploaded in your previous comment and it performs better than any of my trained models. It achieves 89.4% without and 90.4% with voting on my machine. Is this something that is expected due to random initialization, or do you think I might still be doing something wrong?
-
Please compare your training log with the one I uploaded in my last post. And you can run several times and test the accuracies.
-
With a simple average voting strategy, the accuracy may increase by about 0.5%. With orientation pooling (for the detailed implementation (please refer to Section 4.4 of this paper ), the accuracy may increase by about 1.0%.
-
I ran the experiment multiple times, but it doesn't change much. The behavior over time is consistent with your training log, but I always seem to miss 0.2-0.3% in accuracy. In my opinion this is tolerable and a lot better than the 85.1% accuracy of my initial training round.
-
Thank you for clarifying and walking me through the process to reproduce your results! Orientation pooling explains the gap in accuracy and concludes the classification experiment for me.
-
I finally managed to reproduce the AO-CNN autoencoder experiment, the resulting Chamfer distance is 1.45
I have pushed the latest code to the master branch.
I automated the process to do the autoencoder experiment with python. I did the experiment last week, the results are reproducible. The trained log, model, and resulting chamfer distance can be downloaded here.