train the image super-resolution model error and train our main model error

Question

train the image super-resolution model error and train our main model error

graphl opened this issue 2 years ago · comments

手顺

代码版本 3ba4adb
最后更新时间: Fri Dec 31 12:10:07 2021 +0800

使用命令将ttf/otf文件做成数据集

使用命令:
> cd data_utils

> fontforge -lang=py -script convert_ttf_to_sfd_mp.py --split train  
> fontforge -lang=py -script convert_ttf_to_sfd_mp.py --split test

> python write_glyph_imgs.py --split train  
> python write_glyph_imgs.py --split test  
> python write_glyph_imgs.py --split train --img_size=256  
> python write_glyph_imgs.py --split test --img_size=256  

> python write_data_to_pkl.py --split train  
> python write_data_to_pkl.py --split test

步骤2:train the neural rasterizer:

使用命令

mv ./data/vecfont_dataset_/ ./data/vecfont_dataset/
mv  ./data/vecfont_dataset/train/mean.npz ./data/vecfont_dataset/train/stdev.npz ./data
python train_nr.py --mode train --experiment_name dvf --model_name neural_raster

步骤3. train the image super-resolution model:

使用命令:
cp -r ./data/vecfont_dataset ./data/glyphss_dataset
python train_sr.py --mode train --name image_sr

出现下面错误

----------------- Options ---------------
               batch_size: 2
                    beta1: 0.0
          char_categories: 52
          checkpoints_dir: ./experiments
           continue_train: False
                crop_size: 256
                 dataroot: ./data/glyphss_dataset/
             dataset_mode: aligned
                direction: BtoA
              display_env: main
             display_freq: 400
               display_id: 1
            display_ncols: 4
             display_port: 8097
           display_server: http://172.31.222.102
          display_winsize: 256
                    epoch: latest
              epoch_count: 1
          experiment_name: dvf
                 gan_mode: lsgan
        gauss_temperature: 0
                  gpu_ids: 0
               image_size: 256
                init_gain: 0.02
                init_type: normal
                 input_nc: 1
                  isTrain: True                                 [default: None]
                lambda_L1: 1.0
                load_iter: 0                                    [default: 0]
                load_size: 256
                       lr: 0.002
           lr_decay_iters: 50
                lr_policy: linear
         max_dataset_size: inf
          mix_temperature: 0.0001
                     mode: train
               model_name: main_model
                 n_epochs: 500
           n_epochs_decay: 500
               n_layers_D: 3
                     name: image_sr
                      ndf: 64
                     netD: basic
                     netG: unet_256
                      ngf: 64
               no_dropout: False
                  no_flip: True
                  no_html: False
                     norm: instance
              num_threads: 4
                output_nc: 1
                    phase: train
                pool_size: 50
               preprocess: none
               print_freq: 100
             save_by_iter: False
          save_epoch_freq: 25
         save_latest_freq: 5000
           serial_batches: False
                   suffix:
               test_epoch: 125
         update_html_freq: 1000
                  verbose: False
----------------- End -------------------
Loading ./data/glyphss_dataset/train/train_all.pkl pickle file ...
Finished loading
Loading ./data/glyphss_dataset/test/test_all.pkl pickle file ...
Finished loading

initialize network with normal
initialize network with normal
model [Pix2PixModel] was created
---------- Networks initialized -------------
[Network G] Total number of parameters : 54.403 M
[Network D] Total number of parameters : 2.764 M
-----------------------------------------------
/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
learning rate 0.0020000 -> 0.0020000
Traceback (most recent call last):
  File "/home/ubuntu/deepvecfont/train_sr.py", line 29, in <module>
    for i, data in enumerate(dataset_train):  # inner loop within one epoch
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ubuntu/anaconda3/envs/dvf/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ubuntu/deepvecfont/dataloader_sr.py", line 30, in __getitem__
    item['rendered_256'] = torch.FloatTensor(cur_glyph['rendered_256']).view(self.char_num, 256, 256) / 255.
KeyError: 'rendered_256'

修改错误

在data_util/write_data_to_pkl.py文件的93行追加以下代码,解决错误

```
93                 if not os.path.exists(os.path.join(cur_font_sfd_dir, 'imgs_256.npy')):
94                     rendered256 = np.zeros((opts.num_char, opts.img_size, opts.img_size), np.uint8)
95                     rendered256[:, :, :] = 255
96                     rendered256 = rendered.tolist()
97                 else:
98                     rendered256 = np.load(os.path.join(cur_font_sfd_dir, 'imgs_256.npy')).tolist()
99                 merged_res['rendered_256'] = rendered256

```

步骤4: train our main model

python main.py --mode train --experiment_name dvf --model_name main_model

发生错误:

Epoch: 899/2000, Batch: 0/1, Loss: 0.913249, img_l1_loss: 0.168880, kl_loss: 0.003375, img_pt_c_loss: 0.020982, mdn_loss: 0.461596, softmax_xent_loss: 0.073407, synsvg_nr_recloss: 0.185008
Epoch: 949/2000, Batch: 0/1, Loss: 0.678629, img_l1_loss: 0.123693, kl_loss: 0.006045, img_pt_c_loss: 0.020730, mdn_loss: 0.283389, softmax_xent_loss: 0.115898, synsvg_nr_recloss: 0.128875
Epoch: 999/2000, Batch: 0/1, Loss: 0.780886, img_l1_loss: 0.189140, kl_loss: 0.004557, img_pt_c_loss: 0.024209, mdn_loss: 0.270104, softmax_xent_loss: 0.115126, synsvg_nr_recloss: 0.177751
Traceback (most recent call last):
  File "/home/ubuntu/deepvecfont/main.py", line 358, in <module>
    main()
  File "/home/ubuntu/deepvecfont/main.py", line 352, in main
    train(opts)
  File "/home/ubuntu/deepvecfont/main.py", line 327, in train
    train_main_model(opts)
  File "/home/ubuntu/deepvecfont/main.py", line 167, in train_main_model
    val_img_decoder_out, val_vggpt_loss, val_kl_loss, val_svg_losses, val_trg_img, val_ref_img, val_trgsvg_nr_out, val_synsvg_nr_out = network_forward(val_data, mean, std, opts, network_modules)
  File "/home/ubuntu/deepvecfont/main.py", line 318, in network_forward
    svg_losses = mdn_top_layer.svg_loss(top_output, trg_seq, trg_seqlen+1, opts.max_seq_len)
  File "/home/ubuntu/deepvecfont/models/svg_decoder.py", line 185, in svg_loss
    seqlen_mask = util_funcs.sequence_mask(trg_seqlen, max_seq_len)
  File "/home/ubuntu/deepvecfont/models/util_funcs.py", line 80, in sequence_mask
    .lt(lengths.unsqueeze(1)))
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

问题

步骤2中发生了错误, 我修改了data_util/write_data_to_pkl.py中的代码，追加代码如下图所示，这样修改代码是否存在问题?

步骤4中发生的错误，是做成的数据有问题吗？还是其他原因?可以帮忙看一下吗？

Yizhi Wang · Answer 1 · Sat Aug 06 2022 22:43:46 GMT+0800 (China Standard Time)

(1) To train the image super resolution model in pkl mode, put rendered_128 (low resolution) and rendered_256 (high resolution) together in pkl files. You code seems correct.
(2) There was a problem with your validation (testing) dataset . Try to print the value of trg_seqlen to find the problem.