Evaluation errors
zen-d opened this issue · comments
@wpeebles Hi, when I follow the #8 's instructions to do the evaluation for DiT-XL/2, the following error pops
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) INVALID_ARGUMENT: activation input is not finite. : Tensor had NaN values
[[{{node 2905231348_876199450/conv_2/CheckNumerics}}]]
[[strided_slice_2/_5]]
(1) INVALID_ARGUMENT: activation input is not finite. : Tensor had NaN values
[[{{node 2905231348_876199450/conv_2/CheckNumerics}}]]
0 successful operations.
0 derived errors ignored.
Do you know how to solve that? Thanks.
@wpeebles Hi, I have checked the saved npz
using np.isnan().any()
, that returns False
. So it is weird to see there error information. BTW, would you like to officially release the evaluation code, so that it would make the comparison fairer and easier?
Sounds like the issue could potentially be with your TensorFlow setup. Are you using TF 2.0+ on GPU? I think older versions aren't supported with ADM's evaluation repo. Are you using their requirements.txt
file from here? To debug this it might be a good idea to download one of ADM's hosted .npz
files (e.g., you could download their "ADM-G + ADM-U" stats which you can find in their README) and see if you still run into any issues.
@wpeebles Thanks for your detailed instructions!
Yes, I followed the original requirements.txt
file from ADM's official repo. The installed TF version is 2.0+, specifically, it is
> conda list | grep tensorflow
tensorflow 2.8.1 cuda102py38h32e99bf_0 conda-forge
tensorflow-base 2.8.1 cuda102py38ha005362_0 conda-forge
tensorflow-estimator 2.8.1 cuda102py38h4357c17_0 conda-forge
tensorflow-gpu 2.8.1 cuda102py38hf05f184_0 conda-forge
In addition, I try the admnet_guided_upsampled_imagenet256.npz
they provided and see the same error.
Since ADM's stats are giving you the same error the issue seems very likely to be either with your TF environment or maybe hardware. Are you running the exact command from the ADM repo?
python evaluator.py VIRTUAL_imagenet256_labeled.npz admnet_guided_upsampled_imagenet256.npz
Not sure if it's helpful but here's my TF environment that runs without any issues:
> conda list | grep tensorflow
tensorflow 2.11.0 pypi_0 pypi
tensorflow-estimator 2.11.0 pypi_0 pypi
tensorflow-io-gcs-filesystem 0.29.0 pypi_0 pypi
You might want to try creating a new environment from scratch using TF's installation instructions. You could also try running some other GPU TensorFlow example code snippets to make sure everything works. Unfortunately it's hard for me to give debugging advice since the issue is outside of the DiT repo