train the image super-resolution model error and train our main model error
graphl opened this issue · comments
Jxyz commented
手顺
代码版本 3ba4adb
最后更新时间: Fri Dec 31 12:10:07 2021 +0800
使用命令将ttf/otf文件做成数据集
使用命令:
> cd data_utils
> fontforge -lang=py -script convert_ttf_to_sfd_mp.py --split train
> fontforge -lang=py -script convert_ttf_to_sfd_mp.py --split test
> python write_glyph_imgs.py --split train
> python write_glyph_imgs.py --split test
> python write_glyph_imgs.py --split train --img_size=256
> python write_glyph_imgs.py --split test --img_size=256
> python write_data_to_pkl.py --split train
> python write_data_to_pkl.py --split test
步骤2:train the neural rasterizer:
使用命令
mv ./data/vecfont_dataset_/ ./data/vecfont_dataset/
mv ./data/vecfont_dataset/train/mean.npz ./data/vecfont_dataset/train/stdev.npz ./data
python train_nr.py --mode train --experiment_name dvf --model_name neural_raster
步骤3. train the image super-resolution model:
使用命令:
cp -r ./data/vecfont_dataset ./data/glyphss_dataset
python train_sr.py --mode train --name image_sr
出现下面错误
----------------- Options ---------------
batch_size: 2
beta1: 0.0
char_categories: 52
checkpoints_dir: ./experiments
continue_train: False
crop_size: 256
dataroot: ./data/glyphss_dataset/
dataset_mode: aligned
direction: BtoA
display_env: main
display_freq: 400
display_id: 1
display_ncols: 4
display_port: 8097
display_server: http://172.31.222.102
display_winsize: 256
epoch: latest
epoch_count: 1
experiment_name: dvf
gan_mode: lsgan
gauss_temperature: 0
gpu_ids: 0
image_size: 256
init_gain: 0.02
init_type: normal
input_nc: 1
isTrain: True [default: None]
lambda_L1: 1.0
load_iter: 0 [default: 0]
load_size: 256
lr: 0.002
lr_decay_iters: 50
lr_policy: linear
max_dataset_size: inf
mix_temperature: 0.0001
mode: train
model_name: main_model
n_epochs: 500
n_epochs_decay: 500
n_layers_D: 3
name: image_sr
ndf: 64
netD: basic
netG: unet_256
ngf: 64
no_dropout: False
no_flip: True
no_html: False
norm: instance
num_threads: 4
output_nc: 1
phase: train
pool_size: 50
preprocess: none
print_freq: 100
save_by_iter: False
save_epoch_freq: 25
save_latest_freq: 5000
serial_batches: False
suffix:
test_epoch: 125
update_html_freq: 1000
verbose: False
----------------- End -------------------
Loading ./data/glyphss_dataset/train/train_all.pkl pickle file ...
Finished loading
Loading ./data/glyphss_dataset/test/test_all.pkl pickle file ...
Finished loading
initialize network with normal
initialize network with normal
model [Pix2PixModel] was created
---------- Networks initialized -------------
[Network G] Total number of parameters : 54.403 M
[Network D] Total number of parameters : 2.764 M
-----------------------------------------------
/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
learning rate 0.0020000 -> 0.0020000
Traceback (most recent call last):
File "/home/ubuntu/deepvecfont/train_sr.py", line 29, in <module>
for i, data in enumerate(dataset_train): # inner loop within one epoch
File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
data = self._next_data()
File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
return self._process_data(data)
File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ubuntu/deepvecfont/dataloader_sr.py", line 30, in __getitem__
item['rendered_256'] = torch.FloatTensor(cur_glyph['rendered_256']).view(self.char_num, 256, 256) / 255.
KeyError: 'rendered_256'
修改错误
在data_util/write_data_to_pkl.py文件的93行追加以下代码,解决错误
```
93 if not os.path.exists(os.path.join(cur_font_sfd_dir, 'imgs_256.npy')):
94 rendered256 = np.zeros((opts.num_char, opts.img_size, opts.img_size), np.uint8)
95 rendered256[:, :, :] = 255
96 rendered256 = rendered.tolist()
97 else:
98 rendered256 = np.load(os.path.join(cur_font_sfd_dir, 'imgs_256.npy')).tolist()
99 merged_res['rendered_256'] = rendered256
```
步骤4: train our main model
python main.py --mode train --experiment_name dvf --model_name main_model
发生错误:
Epoch: 899/2000, Batch: 0/1, Loss: 0.913249, img_l1_loss: 0.168880, kl_loss: 0.003375, img_pt_c_loss: 0.020982, mdn_loss: 0.461596, softmax_xent_loss: 0.073407, synsvg_nr_recloss: 0.185008
Epoch: 949/2000, Batch: 0/1, Loss: 0.678629, img_l1_loss: 0.123693, kl_loss: 0.006045, img_pt_c_loss: 0.020730, mdn_loss: 0.283389, softmax_xent_loss: 0.115898, synsvg_nr_recloss: 0.128875
Epoch: 999/2000, Batch: 0/1, Loss: 0.780886, img_l1_loss: 0.189140, kl_loss: 0.004557, img_pt_c_loss: 0.024209, mdn_loss: 0.270104, softmax_xent_loss: 0.115126, synsvg_nr_recloss: 0.177751
Traceback (most recent call last):
File "/home/ubuntu/deepvecfont/main.py", line 358, in <module>
main()
File "/home/ubuntu/deepvecfont/main.py", line 352, in main
train(opts)
File "/home/ubuntu/deepvecfont/main.py", line 327, in train
train_main_model(opts)
File "/home/ubuntu/deepvecfont/main.py", line 167, in train_main_model
val_img_decoder_out, val_vggpt_loss, val_kl_loss, val_svg_losses, val_trg_img, val_ref_img, val_trgsvg_nr_out, val_synsvg_nr_out = network_forward(val_data, mean, std, opts, network_modules)
File "/home/ubuntu/deepvecfont/main.py", line 318, in network_forward
svg_losses = mdn_top_layer.svg_loss(top_output, trg_seq, trg_seqlen+1, opts.max_seq_len)
File "/home/ubuntu/deepvecfont/models/svg_decoder.py", line 185, in svg_loss
seqlen_mask = util_funcs.sequence_mask(trg_seqlen, max_seq_len)
File "/home/ubuntu/deepvecfont/models/util_funcs.py", line 80, in sequence_mask
.lt(lengths.unsqueeze(1)))
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
问题
- 步骤2中发生了错误, 我修改了data_util/write_data_to_pkl.py中的代码,追加代码如下图所示,这样修改代码是否存在问题?
- 步骤4中发生的错误,是做成的数据有问题吗?还是其他原因?可以帮忙看一下吗?
Yizhi Wang commented
(1) To train the image super resolution model in pkl mode, put rendered_128
(low resolution) and rendered_256
(high resolution) together in pkl files. You code seems correct.
(2) There was a problem with your validation (testing) dataset . Try to print the value of trg_seqlen
to find the problem.