experiments 子目录无法即时创建 | Sub-directories of experiments folder cannot be created in time.
lantel-wm opened this issue · comments
开始训练后,当前实验对应的 experiments 下的子目录没有立刻创建,在开始下一次训练之后上一次实验对应的子目录才会出现在 experiments 文件夹下。除此以外一切正常。训练使用的指令如下:
python basicsr/train.py -opt options/train/SwinIR/train_SwinIR_meta_upscale.yml
请问该如何解决这个问题?谢谢!
Once training started, the corresponding sub-dir in experiments folder was not created immediately. It was not until the next experiment was launched that previous sub-dir was created. Everything else is fine.
python basicsr/train.py -opt options/train/SwinIR/train_SwinIR_meta_upscale.yml
Could anyone help me with this problem? Thanks!
配置文件内容如下:
configuration file is as follow:
# general settings
name: train_SwinIR_SR_meta_upscale_scratch_P48W8_t2m_B1G4
model_type: SwinIRModel
# scale: 4
num_gpu: 1
manual_seed: 0
# dataset and data loader settings
datasets:
train:
name: t2m_train
type: t2mDataset
dataroot_gt: /mnt/ssd/sr/datasets/t2m_1940_1950/y
dataroot_lq: /mnt/ssd/sr/datasets/t2m_1940_1950/x
start_date: 19400101
end_date: 19481231
mean: 275.90152
std: 23.808582
io_backend:
type: disk
# data loader
num_worker_per_gpu: 16
batch_size_per_gpu: 1
dataset_enlarge_ratio: 1
prefetch_mode: ~
val:
name: t2m_val
type: t2mDataset
dataroot_gt: /mnt/ssd/sr/datasets/t2m_1940_1950/y
dataroot_lq: /mnt/ssd/sr/datasets/t2m_1940_1950/x
start_date: 19490101
end_date: 19501231
mean: 275.90152
std: 23.808582
io_backend:
type: disk
# network structures
network_g:
type: SwinIRMetaUpsample
upscale_v: !!float 7.510416666666667
upscale_h: !!float 10
in_chans: 1
img_size: [96, 144]
window_size: 8
img_range: 1.
depths: [6, 6, 6, 6, 6, 6]
embed_dim: 90
num_heads: [6, 6, 6, 6, 6, 6]
mlp_ratio: 2
upsampler: 'meta'
resi_connection: '1conv'
# path
path:
pretrain_network_g: ~
strict_load_g: true
resume_state: ~
# training settings
train:
ema_decay: 0.999
optim_g:
type: Adam
lr: !!float 2e-4
weight_decay: 0
betas: [0.9, 0.99]
scheduler:
type: MultiStepLR
milestones: [250000, 400000, 450000, 475000]
gamma: 0.5
total_iter: 500000
warmup_iter: -1 # no warm up
# losses
pixel_opt:
type: L1Loss
loss_weight: 1.0
reduction: mean
# validation settings
val:
val_freq: !!float 5e3
save_img: true
metrics:
psnr: # metric name, can be arbitrary
type: calculate_psnr
crop_border: 4
test_y_channel: false
# logging settings
logger:
print_freq: 100
save_checkpoint_freq: !!float 5e3
use_tb_logger: true
wandb:
project: ~
resume_id: ~
# dist training settings
dist_params:
backend: nccl
port: 29500
我搞明白了,正在运行的实验会保存在experiments/train_[exp_name]目录下,当一次实验结束后会将该次实验的所有输出内容移动到experiemnts/train_archived_[timestamp]目录下,我只关注了后缀带 [timestamp] 的目录。抱歉打扰!
I figured it out. The ongoing experiments will be saved in the directory 'experiments/train_[exp_name]'. Once an experiment is finished, all the output of that experiment will be moved to the directory 'experiments/train_archived_[timestamp]'. I only focused on the directories with the suffix [timestamp]. Sorry for the disturbance!