omron-sinicx / neural-astar

Official implementation of "Path Planning using Neural A* Search" (ICML-21)

Home Page:https://omron-sinicx.github.io/neural-astar

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Metrics visualization during training and evaluation

GigSam opened this issue · comments

On the "minimal" branch, differently from what was done in the example.ipynb file of the previous version of the repo (the one without pytorch lightning, similar to the branch "3-improve-repository-organization"), it seems that you don't use the logs of the Opt, Exp and Hmean metrics when the training is performed. I would like to visualize those metrics, but the "metrics" folder isn't created by running the train.py script. Thank you for your support.

Hi, thank you for your post!

If you open a tensorboard you can see the progress of those metrics (p_opt, p_exp, and h_mean) as shown here: #4 (comment) is this what you are looking for?

Hi 😀
I am facing the same issue, I am not able to see the training progress (loss and metrics) because no log files are generated. Is this normal?

Thank you!

Thank you! At least when working on #9, all the metrics were logged as intended. Will look into it.

Hi! I've been investigating this issue but am having difficulty reproducing it. If I clone the repository, create venv, and run train.py, the metrics were logged on tb as follows.

image

My environment is with:

  • WSL2 (Ubuntu 20.04) on Windows 11
  • venv created with python==3.8
  • tensorboard==2.11.0
  • pytorch-lightning==1.8.5.post0

I will try other envs and module versions, but would it be possible to share your environment and versions of related modules (maybe tb and ptl versions may affect?) that cause this logging issue? or did you get any warning messages for logging failures? @GigSam @luigidamico100

Thank you!

@yonetaniryo my environment is:

  • Windows 11
  • venv created with python==3.10.9
  • tensorboard==2.10.1
  • pytorch-lightning==1.8.5.post0

The problem is that by cloning the repo, creating and activating venv and running train.py i don't see any "metrics" folder nor any log produced by the training file, even if the algorithm works fine and no warning for logging is produced. I really don't know what's causing this issue.

Thank you for sharing your environment. Just wanted to make sure that the logs are stored in model/mazes_032_moore_c8/lightning_logs/version_* for mazes_032_moore_c8, not in metrics. Also we have the only checkpoint in model/mazes_032_moore_c8/lightning_logs/version_0 on github to reduce the repository size. When you clone the repo and start the training, the following dir and files should appear:

model/mazes_032_moore_c8/lightning_logs/version_1:
checkpoints  events.out.tfevents....  hparams.yaml

I have checked the logging in the environment as close as that of @GigSam with python3.10.9 and tensorboard==2.10.1 used. However I'm not yet able to reproduce the issue. Can you double check if the logs are stored in model dir? Or you may try using our Dockerfile that will give us exactly the same environment. Thank you!

Sorry but I’m going to close this issue because I cannot reproduce the logging problem. If someone encounters the same problem please check if the metrics data are stored in model directory. And please don’t hesitate to re-open the issue if you can reproduce the problem. Thank you for the report!