pengzhiliang / Conformer

Official code for Conformer: Local Features Coupling Global Representations for Visual Recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

training question

JhihJhe opened this issue · comments

commented

Thanks for your nice work!
Here I encountered a question about training from scratch for custom data, the error message is shown as the following:

D:\dl\Conformer-main>python main.py --model Conformer_small_patch16 --data-set IMNET --batch-size 4 --lr 0.001 --num_workers 0 --data-path ./datasets/test/ --output_dir ./output/test/ --epochs 10
Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=4, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='./datasets/test/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=10, eval=False, evaluate_freq=1, finetune='', inat_category='name', input_size=224, lr=0.001, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='Conformer_small_patch16', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=0, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='./output/test/', patience_epochs=10, pin_mem=True, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
Creating model: Conformer_small_patch16
number of params: 37673424
Start training
Traceback (most recent call last):
File "main.py", line 375, in
main(args)
File "main.py", line 335, in main
set_training_mode=args.finetune == '' # keep in eval mode during finetuning
File "D:\dl\Conformer-main\engine.py", line 30, in train_one_epoch
for samples, targets in metric_logger.log_every(data_loader, print_freq, header):
File "D:\dl\Conformer-main\utils.py", line 157, in log_every
header, total_time_str, total_time / len(iterable)))
ZeroDivisionError: float division by zero

Kindly for help, thanks!

It looks like a problem with the data set, did you load the data set correctly?
You can use print(len(dataset_train)) to check.

commented

Thanks for your answer!
Here is the check result, my test image data has 18 images.

image

The printed result also shows 18 images.
My directory structure is as the same as yours:
./datasets/test/
train/
fail/
img1.jpg
pass/
img2.jpg
val/
fail/
img3.jpg
pass/
img4.jpg

Thanks a lot!

我可以在目标检测的网络上使用conformer吗?比如说centernet

@JhihJhe I'm sorry for the late reply. If it is not the problem of the dataset, I am not sure what the specific reason is. I suggest you use the ImageNet2012 dataset to test it.

@zhaozhiyi11 Of course you can use Conformer to replace the backbone of centernet, but I cannot guarantee its performance. If you have conducted an experiment, you are welcome to report the results. If you encountered a problem, I can also help solve it. Thanks!

性能

请问您的性能如何