Can't train network on multiple GPUs
adhara123007 opened this issue · comments
Hi,
I am trying to train the network on multiple GPUs but I get the error:
F0321 13:50:58.896466 271 parallel.cpp:55] Check failed: total_size == (ptr == buffer ? 1 : ptr - buffer) (118335438 vs. 117426126)
*** Check failure stack trace: ***
@ 0x7fb6d90315cd google::LogMessage::Fail()
@ 0x7fb6d9033433 google::LogMessage::SendToLog()
@ 0x7fb6d903115b google::LogMessage::Flush()
@ 0x7fb6d9033e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fb6d96cf8fd caffe::GPUParams<>::configure()
@ 0x7fb6d96cfd4b caffe::P2PSync<>::P2PSync()
@ 0x7fb6d96d1672 caffe::P2PSync<>::Prepare()
@ 0x7fb6d96d1cde caffe::P2PSync<>::Run()
@ 0x40a80f train()
@ 0x4075b8 main
@ 0x7fb6d7ae5830 __libc_start_main
@ 0x407d29 _start
@ (nil) (unknown)
If I am using one GPU (any of the two GPUs in the system), everything seems to run okay.
Sorry, we never used Caffe on multiple GPUs and I have no experience with that.
Thanks