brucefan1983 / GPUMD

Graphics Processing Units Molecular Dynamics

Home Page:https://gpumd.org/dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix number of decimals, not digits, in energy_test.out and energy_train.out from NEP training

NicklasOsterbacka opened this issue · comments

The energy_test.out and energy_train.out outputted from NEP training seems to have a fixed number of digits per column, namely 6. This becomes an issue when training on data generated by e.g. CP2K, which gives total energies that are significantly higher than those from VASP. This may lead to significant truncation of the predicted and target energies in the aforementioned files, and a very confused end user when confronted with energy parity plots that make little sense.

An easy work-around from the user is to offset the total energies by e.g. the dataset's mean energy, but it would be nice if this could be avoided. A fix would be to change the behaviour to printing a fixed number of decimals (e.g. 640.xxxxxx) instead of a fixed number of digits (e.g. 640.xxx).

The nep executable used single precision uniformly. So one cannot expect to have a higher accuracy than single precision for any output. For CP2K training data, I strongly suggest pre-processing them such that the energies are close to zero. Perhaps need to emphasize this point in the documentation.

It means that the energy data output from CP2K must be in double precision, before any further processing of them. Otherwise accuracy has already been lost.

Yes, perhaps that is the best strategy! Users would hopefully realize what causes the issue and the solution, but having it in the documentation would be nice.

how about we re-open this issue and close it after the doumentation is improved. I am also thinkg giving a waring message if the training energies are too large in absolute value.

Printing a warning, maybe along with a short explanation of the issue, in such cases sounds like a great idea.

solved in PR #378