v0.x CI is broken
leezu opened this issue · comments
We can see the CI will pass seemingly randomly:
failure log
[2021-02-23T23:41:05.382Z] -- Detecting C compile features - done
[2021-02-23T23:41:05.382Z] CMake Error at /var/lib/jenkins/gluon-nlp-gpu-py3/conda/gpu/py3/lib/python3.6/site-packages/cmake/data/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:165 (message):
[2021-02-23T23:41:05.382Z] Could NOT find Mxnet (missing: Mxnet_LIBRARIES) (Required is at least
[2021-02-23T23:41:05.382Z] version "1.4.0")
[2021-02-23T23:41:05.382Z] Call Stack (most recent call first):
[2021-02-23T23:41:05.382Z] /var/lib/jenkins/gluon-nlp-gpu-py3/conda/gpu/py3/lib/python3.6/site-packages/cmake/data/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:458 (_FPHSA_FAILURE_MESSAGE)
[2021-02-23T23:41:05.382Z] cmake/Modules/FindMxnet.cmake:54 (find_package_handle_standard_args)
[2021-02-23T23:41:05.382Z] horovod/mxnet/CMakeLists.txt:12 (find_package)
So far all failing runs were on
ip-172-31-43-211 │ip-172-31-19-212
whereas the successful run was on
ip-172-31-22-205
It may be due to mismatch in instance configuration.
@barry-jin would you have time to backport the Github Actions CI implementation to the v0.x branch? Then we can get rid of all the troubles with Jenkins
@leezu Sure.