training question
JhihJhe opened this issue · comments
Thanks for your nice work!
Here I encountered a question about training from scratch for custom data, the error message is shown as the following:
D:\dl\Conformer-main>python main.py --model Conformer_small_patch16 --data-set IMNET --batch-size 4 --lr 0.001 --num_workers 0 --data-path ./datasets/test/ --output_dir ./output/test/ --epochs 10
Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=4, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='./datasets/test/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=10, eval=False, evaluate_freq=1, finetune='', inat_category='name', input_size=224, lr=0.001, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='Conformer_small_patch16', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=0, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='./output/test/', patience_epochs=10, pin_mem=True, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
Creating model: Conformer_small_patch16
number of params: 37673424
Start training
Traceback (most recent call last):
File "main.py", line 375, in
main(args)
File "main.py", line 335, in main
set_training_mode=args.finetune == '' # keep in eval mode during finetuning
File "D:\dl\Conformer-main\engine.py", line 30, in train_one_epoch
for samples, targets in metric_logger.log_every(data_loader, print_freq, header):
File "D:\dl\Conformer-main\utils.py", line 157, in log_every
header, total_time_str, total_time / len(iterable)))
ZeroDivisionError: float division by zero
Kindly for help, thanks!
It looks like a problem with the data set, did you load the data set correctly?
You can use print(len(dataset_train))
to check.
我可以在目标检测的网络上使用conformer吗?比如说centernet
@JhihJhe I'm sorry for the late reply. If it is not the problem of the dataset, I am not sure what the specific reason is. I suggest you use the ImageNet2012 dataset to test it.
@zhaozhiyi11 Of course you can use Conformer to replace the backbone of centernet, but I cannot guarantee its performance. If you have conducted an experiment, you are welcome to report the results. If you encountered a problem, I can also help solve it. Thanks!
性能
请问您的性能如何