SerialLain3170 / adeleine

Automatic line art colorization using various types of hint or without hint

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training data for reference_video

jamie212 opened this issue · comments

Hello, your work is excellent! I want to train the 'reference_video' model, but you only provided the method for placing the data. May I ask what dataset you used? And where can I download it?

Sorry for the late response. I had prepared one episode of 18 animation titles (about 540=30x18 min) from official channels on YouTube.

Sorry for the late response. I had prepared one episode of 18 animation titles (about 540=30x18 min) from official channels on YouTube.

Thank you for your response. Do you mean that you downloaded 18 animation episodes, each 30 minutes long, from YouTube, and then converted them into frames to extract sketches? May I ask which official channels these were from? Additionally, you mentioned storing the data as distance field images; could you explain how these are obtained?

Thank you for your response. Do you mean that you downloaded 18 animation episodes, each 30 minutes long, from YouTube, and then converted them into frames to extract sketches?

Yes. They were originally from the channel, but they seem to have stopped providing. I think that any 18 animation episodes are OK as far as they are from different animations to make sure the robustness of the model.

Additionally, you mentioned storing the data as distance field images; could you explain how these are obtained?

Please refer to #14

Thank you for your response. Do you mean that you downloaded 18 animation episodes, each 30 minutes long, from YouTube, and then converted them into frames to extract sketches?

Yes. They were originally from the channel, but they seem to have stopped providing. I think that any 18 animation episodes are OK as far as they are from different animations to make sure the robustness of the model.

Additionally, you mentioned storing the data as distance field images; could you explain how these are obtained?

Please refer to #14

I would like to ask if you are looking for 18 videos, each 30 minutes long? Also, what fps did you set when converting them into frames? If it's 30 frames per second, then a 30-minute video would turn into 54000 frames. Do you then put all these 54000 frames into anime_dir? Wouldn't that be too many? Because it seems from the paper that they only use very short videos

I would like to ask if you are looking for 18 videos, each 30 minutes long? Also, what fps did you set when converting them into frames? If it's 30 frames per second, then a 30-minute video would turn into 54000 frames. Do you then put all these 54000 frames into anime_dir?

Yes about all the questions. I used 30 fps.

Wouldn't that be too many? Because it seems from the paper that they only use very short videos

Good question. I wanted to use as many datasets as possible to improve the generalizability when I conducted experiments, and I did not training the model with a variation of number of datasets. Therefore, I do not have solid answers for that question. But as you said, it would be too many to train the model.

Good question. I wanted to use as many datasets as possible to improve the generalizability when I conducted experiments, and I did not training the model with a variation of number of datasets. Therefore, I do not have solid answers for that question. But as you said, it would be too many to train the model.

OK! Thank you for your response, I will give it a try.

I would like to ask if you are looking for 18 videos, each 30 minutes long? Also, what fps did you set when converting them into frames? If it's 30 frames per second, then a 30-minute video would turn into 54000 frames. Do you then put all these 54000 frames into anime_dir?

Yes about all the questions. I used 30 fps.

Wouldn't that be too many? Because it seems from the paper that they only use very short videos

Good question. I wanted to use as many datasets as possible to improve the generalizability when I conducted experiments, and I did not training the model with a variation of number of datasets. Therefore, I do not have solid answers for that question. But as you said, it would be too many to train the model.

Hello, I have a few questions regarding training:

  1. Did you put folders from 18 different anime into the DATA_PATH? I'm asking because it seems from the paper that only data from one anime was used for training, and data from other anime were used for testing. I just want to confirm.
  2. Could you please explain what 'validsize' and 'anime_dir' in the param.yaml file are used for, and what should they be set to?
  3. In your code, is there a testing process included, or do I need to write my own code for inference?

I am really sorry for the response. First, I need to mention that I am not referring to the original paper rigorously. I just borrowed the ideas from Method (Section 3) and shot selection (former part of Section 4.1). So, I have not been careful about the dataset selection.

  1. Yes. anime_dir parameter in param.yaml is used to set what animation (directory name in DATA_PATH) is utilized to train. If you have 10 animations for training and 3 datasets for testing, you should include 10 animations in anime_dir. For the 3 datasets for testing, you are required to write your own code to infer.
  2. validsize is the batch size during the validation. Sorry for confusion because it is very close to valid_size.
  3. It is already mentioned in 1, you need to write your own code for the inference.

I am really sorry for the response. First, I need to mention that I am not referring to the original paper rigorously. I just borrowed the ideas from Method (Section 3) and shot selection (former part of Section 4.1). So, I have not been careful about the dataset selection.

  1. Yes. anime_dir parameter in param.yaml is used to set what animation (directory name in DATA_PATH) is utilized to train. If you have 10 animations for training and 3 datasets for testing, you should include 10 animations in anime_dir. For the 3 datasets for testing, you are required to write your own code to infer.
  2. validsize is the batch size during the validation. Sorry for confusion because it is very close to valid_size.
  3. It is already mentioned in 1, you need to write your own code for the inference.

Thank you very much for your response. I have trained the model myself, and the ctn part seems to be fine.(picture1) However, the visualize images produced during the training of tcn are showing up in gray. I have written some testing code and it outputs a similar result.(picture 2) Do you have any idea what might be the problem?
截圖 2023-12-19 下午4 05 59
截圖 2023-12-19 下午4 03 54

As I mentioned before, it seems that there is an issue with the code for TCN. The visualized images saved during training, when processed through TCN, result in outputs that are all in gray. However, this is normal with CTN. I would like to inquire whether I made a mistake in my operation or if this behavior is expected.

I am really sorry for the late response. I do not have the solid answer for your question. I confirmed that the training TCN was unstable, and changing the hyperparameter (batch size and learning rate) led to the stable behavior. Could you try to increase batch size or decrease learning rate? If this does not work, I do not have any ideas.

I am really sorry for the late response. I do not have the solid answer for your question. I confirmed that the training TCN was unstable, and changing the hyperparameter (batch size and learning rate) led to the stable behavior. Could you try to increase batch size or decrease learning rate? If this does not work, I do not have any ideas.

Increasing my batch size causes CUDA out of memory:( I will try reducing the learning rate to see if it helps. Thank you for your suggestion. However, I want to confirm, when you say 'unstable', are you referring to the situation with the gray images?

However, I want to confirm, when you say 'unstable', are you referring to the situation with the gray images?

Yes. Situation with the grey images would be a result of mode collapse, so the trained generator has found an easy solution. You might be required to add the regularization loss term to avoid the mode collapse. If decreasing learning rate does not work, it might help.