dk-liang / CLTR

[ECCV 2022] An End-to-End Transformer Model for Crowd Localization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to training with one device

Mr-zhao765 opened this issue · comments

i am interested in this great job! i wanna know how to training with one gpu device

you can try python train_distributed.py

because mmcv version
I use MMLogger instead of get_logger in utils.py
and the training log as follow, i wanna know is this right or wrong?

Not using distributed mode
model params: 43.446471
06/16 15:18:42 - mmengine - INFO - model params: = 43.446
best result: 100000.0
06/16 15:18:42 - mmengine - INFO - best result = 100000.000
mae 498.84 mse 589.7925058866042
mae 498.84 mse 589.7925058866042

thank you ! i know
set config.py local_rank =0 is ok

Not using distributed mode
model params: 43.446471
06/16 15:39:45 - mmengine - INFO - model params: = 43.446
06/16 15:39:45 - mmengine - INFO - {'dataset': 'jhu', 'save_path': './save_file/log_file/debug/', 'workers': 1, 'print_freq': 10, 'start_epoch': 0, 'epochs': 1500, 'pre': None, 'batch_size': 4, 'crop_size': 256, 'lr_step': 1200, 'seed': 1, 'best_pred': 100000.0, 'gpu_id': '0', 'lr': 0.0001, 'weight_decay': 0.0005, 'save': True, 'scale_aug': True, 'scale_type': 0, 'scale_p': 0.3, 'gray_aug': False, 'gray_p': 0.1, 'test_patch': True, 'channel_point': 3, 'num_patch': 1, 'min_num': -1, 'num_knn': 4, 'test_per_epoch': 2, 'threshold': 0.35, 'video_path': './video_demo/1.mp4', 'local_rank': 0, 'lr_backbone': 0.0001, 'lr_drop': 40, 'clip_max_norm': 0.1, 'frozen_weights': None, 'backbone': 'resnet50', 'dilation': False, 'position_embedding': 'sine', 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 500, 'pre_norm': False, 'masks': False, 'aux_loss': True, 'set_cost_class': 2, 'set_cost_point': 5, 'set_cost_giou': 2, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'cls_loss_coef': 2, 'count_loss_coef': 2, 'point_loss_coef': 5, 'giou_loss_coef': 2, 'focal_alpha': 0.25, 'dataset_file': 'crowd_data', 'coco_path': None, 'coco_panoptic_path': None, 'remove_difficult': False, 'output_dir': '', 'device': 'cuda', 'resume': '', 'eval': False, 'num_workers': 2, 'world_size': 1, 'dist_url': 'env:// ', 'master_port': 29501, 'distributed': False, 'train_patch': True}
best result: 100000.0
06/16 15:39:45 - mmengine - INFO - best result = 100000.000
06/16 15:39:45 - mmengine - INFO - best result=100000.000 start epoch=0.000
06/16 15:39:45 - mmengine - INFO - start training!
06/16 15:40:21 - mmengine - INFO - Training Epoch:[0/1500] loss=18.38180 lr=0.000100 epoch_time=35.564
06/16 15:40:21 - mmengine - INFO - begin test
mae 498.84 mse 589.7925058866042
06/16 15:40:29 - mmengine - INFO - Testing Epoch:[0/1500] mae=498.840 mse=589.793 best_mae=498.840
06/16 15:41:02 - mmengine - INFO - Training Epoch:[1/1500] loss=7.89438 lr=0.000100 epoch_time=32.623
06/16 15:41:35 - mmengine - INFO - Training Epoch:[2/1500] loss=6.51217 lr=0.000100 epoch_time=32.710
06/16 15:41:35 - mmengine - INFO - begin test