A problem about train on step 1

Question

A problem about train on step 1

PeeKaBo0L opened this issue 10 months ago · comments

hi！ Thank for your outstanding work！
when i was Training MagicPoint on Synthetic Shapes
it warn that
Training MagicPoint on Synthetic Shapes
/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
[10/18/2023 19:35:02 INFO] Running command TRAIN
[10/18/2023 19:35:02 INFO] Number of GPUs detected: 0
[10/18/2023 19:35:05 INFO] Extracting archive for primitive draw_lines.
[10/18/2023 19:35:11 INFO] Extracting archive for primitive draw_polygon.
[10/18/2023 19:35:17 INFO] Extracting archive for primitive draw_multiple_polygons.
[10/18/2023 19:35:23 INFO] Extracting archive for primitive draw_ellipses.
[10/18/2023 19:35:29 INFO] Extracting archive for primitive draw_star.
[10/18/2023 19:35:34 INFO] Extracting archive for primitive draw_checkerboard.
[10/18/2023 19:35:40 INFO] Extracting archive for primitive draw_stripes.
[10/18/2023 19:35:44 INFO] Extracting archive for primitive draw_cube.
[10/18/2023 19:35:50 INFO] Extracting archive for primitive gaussian_noise.
[10/18/2023 19:35:56 INFO] Caching data, fist access will take some time.
[10/18/2023 19:35:58 INFO] Caching data, fist access will take some time.
[10/18/2023 19:35:58 INFO] Caching data, fist access will take some time.
2023-10-18 19:35:58.151225: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2023-10-18 19:35:58.291856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: NVIDIA GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:40:00.0
totalMemory: 10.75GiB freeMemory: 10.59GiB
2023-10-18 19:35:58.291879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2023-10-18 19:35:58.559945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-18 19:35:58.559970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2023-10-18 19:35:58.559974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2023-10-18 19:35:58.560039: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2023-10-18 19:35:58.560067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10224 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:40:00.0, compute capability: 7.5)
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:35:59 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
[10/18/2023 19:36:00 INFO] Scale of 0 disables regularizer.
2023-10-18 19:36:00.439626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-18 19:36:00.439640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]
2023-10-18 19:36:03.561382: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at iterator_ops.cc:947 : Invalid argument: Batch size must be greater than zero.
[[{{node PaddedBatchDatasetV2}} = PaddedBatchDatasetV2[N=4, Toutput_types=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32], output_shapes=[[?,?,?,1], [?,?,?], [?,?,2], [?,?,?]], _device="/device:CPU:0"](RepeatDataset, PaddedBatchDatasetV2/magicpoint/batch_size, PaddedBatchDatasetV2/magicpoint/Const, PaddedBatchDatasetV2/magicpoint/Const_1, PaddedBatchDatasetV2/magicpoint/Const_2, PaddedBatchDatasetV2/magicpoint/Const_1, PaddedBatchDatasetV2/magicpoint/padding_value, PaddedBatchDatasetV2/magicpoint/padding_value_1, PaddedBatchDatasetV2/magicpoint/padding_value, PaddedBatchDatasetV2/magicpoint/padding_value_1, PaddedBatchDatasetV2/magicpoint/drop_remainder)]]
Traceback (most recent call last):
File "/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/envs/my-env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Batch size must be greater than zero.
[[{{node PaddedBatchDatasetV2}} = PaddedBatchDatasetV2[N=4, Toutput_types=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32], output_shapes=[[?,?,?,1], [?,?,?], [?,?,2], [?,?,?]], _device="/device:CPU:0"](RepeatDataset, PaddedBatchDatasetV2/magicpoint/batch_size, PaddedBatchDatasetV2/magicpoint/Const, PaddedBatchDatasetV2/magicpoint/Const_1, PaddedBatchDatasetV2/magicpoint/Const_2, PaddedBatchDatasetV2/magicpoint/Const_1, PaddedBatchDatasetV2/magicpoint/padding_value, PaddedBatchDatasetV2/magicpoint/padding_value_1, PaddedBatchDatasetV2/magicpoint/padding_value, PaddedBatchDatasetV2/magicpoint/padding_value_1, PaddedBatchDatasetV2/magicpoint/drop_remainder)]]
[[{{node magicpoint/OneShotIterator}} = OneShotIterator]]

During handling of the above exception, another exception occurred:

it seems like the batch size is wrong? But I don't know where to fix it, can you help me? thank you very much!

PeeKaBo0L · Answer 1 · Wed Oct 18 2023 20:19:00 GMT+0800 (China Standard Time)

before train
in terminal typre
export CUDA_VISIBLE_DEVICES=0