Batch size less than 12 gives an error
aldinorizaldy opened this issue · comments
Hi @Gofinge ,
sorry for bothering you with this stupid question. If I use S3DIS data and batch_size = 12 in config file (the default value), it works perfectly. But if I reduce the size, it gives me an error.
I also have to set the batch size = 1 when I use another data (Vaihingen 3D) which has 1 point cloud for the train split. Otherwise I had similar error with this #163 (comment) because the batch size is larger than the train samples.
I've tried looking for the same error but it seems no one has experienced this error.
This is the error
=========> RUN TASK <=========
/opt/conda/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/__init__.py:36: UserWarning: The environment variable `OMP_NUM_THREADS` not set. MinkowskiEngine will automatically set `OMP_NUM_THREADS=16`. If you want to set `OMP_NUM_THREADS` manually, please export it on the command line before running a python script. e.g. `export OMP_NUM_THREADS=12; python your_program.py`. It is recommended to set it below 24.
warnings.warn(
Traceback (most recent call last):
File "exp/vaihingen3d/v3d_semseg-spunet-v1m1-0-base/code/tools/train.py", line 38, in <module>
main()
File "exp/vaihingen3d/v3d_semseg-spunet-v1m1-0-base/code/tools/train.py", line 27, in main
launch(
File "/home/rizald42/containers/Pointcept/exp/vaihingen3d/v3d_semseg-spunet-v1m1-0-base/code/pointcept/engines/launch.py", line 74, in launch
mp.spawn(
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 2 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/rizald42/containers/Pointcept/exp/vaihingen3d/v3d_semseg-spunet-v1m1-0-base/code/pointcept/engines/launch.py", line 137, in _distributed_worker
main_func(*cfg)
File "/home/rizald42/containers/Pointcept/exp/vaihingen3d/v3d_semseg-spunet-v1m1-0-base/code/tools/train.py", line 18, in main_worker
cfg = default_setup(cfg)
File "/home/rizald42/containers/Pointcept/exp/vaihingen3d/v3d_semseg-spunet-v1m1-0-base/code/pointcept/engines/defaults.py", line 136, in default_setup
assert cfg.batch_size % world_size == 0
AssertionError
Thanks!!
As the AssertionError said, batch_size % world_size should be 0 (e.g. 12 % 4 == 0).
Thanks!! I did not realize the meaning of the world_size.