Understanding Ithaca's Evaluation/Checkpointing System

Question

Understanding Ithaca's Evaluation/Checkpointing System

alocaputo opened this issue a year ago · comments

Alessandro Locaputo commented a year ago

Hello all,

I'm seeking clarification on how Ithaca's evaluation/checkpointing system works.

From my understanding, the evaluate function should calculate the evaluation metrics and store the checkpoint's pickle file on disk. However, I'm uncertain about when this function is called.

Currently, when I execute the code, I notice that it just generate a log file containing the training loss and the accuracy. However, it doesn't include information about the validation loss, nor a checkpoint is produced.

Also when I try to run:
python3 experiment.py --config=config.py --jaxline_mode=eval --logtostderr
it says:
Checkpoint None invalid or already evaluated, waiting.

Thank you for your time and assistance.

Best regards,
Alessandro

Pragash Mohanarajah · Answer 1 · Sun Feb 18 2024 04:33:41 GMT+0800 (China Standard Time)

Hello all,

I have also been working independently on Ithaca's Transformer model.
I have been experiencing the same issues when trying to recreate the model.

My understanding of Jaxline suggests that the intermediate training checkpoints are saved to the Tensorboard itself, and not as a pickle file. However, towards the end of training, it would be helpful to generate a pickle file to save the trained model.
I believe that training works relatively well, creating a TensorBoard log in the elected checkpoint directory.

However, as described above, the evaluation sequence continues to fail, regardless of how it is run.
There seems to be an issue with how the checkpoint directory is chosen; the model logs are not being saved as a result.
I have tried various adjustments to the original experiment.py file, but have had no success in building and saving a model.
In the parallel training/evaluation mode of jaxline, similar issues arose.

Please could you kindly update the experiment.py file and any other associated files, such that they function well together.
It would also be helpful to know the exact package environment configuration, which has worked during training and evaluation. New package versions have come since the release of Ithaca, often containing breaking changes.
A subsequent change to the requirements.txt could resolve these difficulties.
I have found that Ithaca functions seamlessly in its usual form, but struggles to function in --editable mode.

Thank you very very much for your time and consideration.
I look forward to hearing from you on this platform.

Kind Regards,
Pragash Mohanarajah