CoinCheung / BiSeNet

Add bisenetv2. My implementation of BiSeNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

训练的时候报错tools/train_amp.py FAILED

yangaiping opened this issue · comments

我按照作者您提供的训练执行命令·:
export CUDA_VISIBLE_DEVICES=0
NGPUS=1
cfg_file=configs/bisenetv2_coco.py
torchrun --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file
遇到了如下报错问题:
image

Hi,

Would you please show me the full error message?

好的,谢谢您,这是完整报错信息
image
image

Have you specified dataset correctly?

您好,这是我的数据集文件
image
image

Did you generate train.txt with the method in README.txt

我刚刚发现当我执行这个代码时会生成train.txt和val.txt,但是这两个文件是空的
image

What is in the folder of images and labels?

Why are these label files in format of txt?
image

Are you using coco-stuff dataset?

我用的是coco2017labels-segments.zip数据集

我可能已经发现了我的问题,我再试试coco-stuff dataset数据集

我已经使用了正确的数据集,并且成功划分数据集,但是训练的时候仍然报错,请问这是为什么呢
image
image

Seems that you have hidden files in your image/train2017 folder, and likely in your train.txt file, would you have a check of this?

我把隐藏的文件.ipynb_checkpoints删掉了重新执行python tools/gen_dataset_annos.py --dataset coco,然后再执行下列命令,又出现了一个新的错误
image
image

python tools/check_dataset_info.py --im_root datasets/coco --im_anns datasets/coco/train.txt

What is the output of this?

Why does your coco-stuff has 201 categories? I used coco-stuff with only 171 classes.

You can change n_cat in the config file into 202, if you would like to use your dataset.

谢谢您,可能是我的数据集问题,我改成202后重新执行训练命令后好像成功运行了,但是我还想问怎么修改iter: 400/180000,感觉180000很大,怎么调小这个参数呢
image

You should modify in the config file

max_iter=180000,

真的非常感谢您,感谢您耐心地解答我的问题,再次感谢!