Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pre-processing and training not working on custom dataset?

graham-eisele opened this issue · comments

Whenever I preprocess the custom dataset, this is the output:

`C:\Users\Graham\Desktop\Lip2Wav-master>python preprocess.py --speaker_root Dataset/larry --speaker larry
C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Started processing for Dataset/larrywith 1 GPUs
0it [00:00, ?it/s]

C:\Users\Graham\Desktop\Lip2Wav-master>`

but there is no new output, and when I try and train, this outputs:

`C:\Users\Graham\Desktop\Lip2Wav-master>python train.py first_run --data_root Dataset/larry/ --preset synthesizer/presets/larry.json
Arguments:
name: first_run
data_root: Dataset/larry/
preset: synthesizer/presets/larry.json
models_dir: synthesizer/saved_models/
mode: synthesis
GTA: True
restore: True
summary_interval: 2500
embedding_interval: 1000000000
checkpoint_interval: 1000
eval_interval: 1000
tacotron_train_steps: 2000000
tf_log_level: 1

Traceback (most recent call last):
File "train.py", line 61, in
log_dir, hparams = prepare_run(args)
File "train.py", line 21, in prepare_run
hparams.add_hparam('all_images', all_images)
File "C:\Users\Graham\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\training\python\training\hparam.py", line 485, in add_hparam
'Multi-valued hyperparameters cannot be empty: %s' % name)
ValueError: Multi-valued hyperparameters cannot be empty: all_images

C:\Users\Graham\Desktop\Lip2Wav-master>`

How do you properly use a custom dataset with this project? Thank you.

ValueError: Multi-valued hyperparameters cannot be empty: all_images

Ensure that this line is returning the list of all face images. If not, your path is incorrect.

It is not returning the list of all face images. I traced it back to get_image_list in hparams.py, and that this part filelist.extend(list(glob(os.path.join(data_root, 'preprocessed', vid_id, '*/*.jpg')))) is causing the issue, because the preprocess.py script appears to not output anything.

Even when using any of the datasets use for the paper, I still get this error after following all instructions on the reademe.

Even when using any of the datasets use for the paper, I still get this error after following all instructions on the reademe.

Yes, I got this error too. And I find the solution for it right now .

You need to revise the code in preprocess.py as follow :
Before :
fulldir = vfile.replace('/intervals/', '/preprocessed/')
After
fulldir = vfile.replace('intervals', 'preprocessed')

You only need to revise the line 65, 97 (two lines only ).

This problem is caused by the line 124 . My enviroment is Windows10 , but the enviroment of author should be Linux . The delimiter of path is different so vfile.replace is not expected .

You also needn't to run preprocess.py again to generate images. It's too long I know. So you can run my code which in my repo https://github.com/Halle-Astra/lip2wav_revised by python preprocess_win_mv.py <name>.

I hope this solution can help you to solve it .

Even when using any of the datasets use for the paper, I still get this error after following all instructions on the reademe.

Yes, I got this error too. And I find the solution for it right now .

You need to revise the code in preprocess.py as follow :
Before :
fulldir = vfile.replace('/intervals/', '/preprocessed/')
After
fulldir = vfile.replace('intervals', 'preprocessed')

You only need to revise the line 65, 97 (two lines only ).

This problem is caused by the line 124 . My enviroment is Windows10 , but the enviroment of author should be Linux . The delimiter of path is different so vfile.replace is not expected .

You also needn't to run preprocess.py again to generate images. It's too long I know. So you can run my code which in my repo https://github.com/Halle-Astra/lip2wav_revised by python preprocess_win_mv.py <name>.

I hope this solution can help you to solve it .

That seems to work, but then I get the same error with train.py.

Changing

def get_image_list(split, data_root): filelist = [] with open(os.path.join(data_root, '{}.txt'.format(split))) as vidlist: for vid_id in vidlist: vid_id = vid_id.strip() filelist.extend(list(glob(os.path.join(data_root, 'preprocessed', vid_id, '*/*.jpg')))) return filelist

to

`def get_image_list(split, data_root):
trainList = []
valList = []

if split == "train":
    with open(os.path.join(data_root, '{}.txt'.format(split))) as vidlist:
        for vid_id in vidlist:
            vid_id = vid_id.strip()
            trainList.extend(list(glob(os.path.join(data_root, 'preprocessed', vid_id, '*/*.jpg'))))
            
if split == "val":
    with open(os.path.join(data_root, '{}.txt'.format(split))) as vidlist:
        for vid_id in vidlist:
            vid_id = vid_id.strip()
            valList.extend(list(glob(os.path.join(data_root, 'preprocessed', vid_id, '*/*.jpg'))))
           
if split == "train":
    #print(trainList)
    return trainList

if split == "val":
    #print(valList)
    return valList`

lets me train, but when I start training it says only .29 hours of training data when there are over 11 hours of training data. Also when I run complete_test_generate.py I get 0.0 hours is available for testing
0it [00:00, ?it/s].

The number of hours is calculated based on (total number of images / fps) / 3600. Ensure all the face images are detected by the script.

I fixed it now, and it says the correct hours and runs, but I don't see and .wavs being output?

it says this

np_resource = np.dtype([("resource", np.ubyte, 1)])
2.2698796296296297 hours is available for testing
0it [00:00, ?it/s] | 0/1 [00:00<?, ?it/s]
100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 999.36it/s]

Does your test set only contain a single video? Please check if any exceptions are being silently ignored in the code

I tried with multiple videos and one video and it output the same thing.

Would this be the correct path to checkpoint? C:\Users\Graham\Desktop\Lip2Wav-master\synthesizer\saved_models\logs-01\taco_pretrained\tacotron_model.ckpt-5000.data-00000-of-00001

Is there a limit to the video length when using complete_test_generate.py?

there is a minimum frame limit = hparams.T. Please add print statements in different places and see if your videos are being skipped.

C:\Users\Graham\Desktop\Lip2Wav-master\synthesizer\saved_models\logs-01\taco_pretrained\tacotron_model.ckpt-5000.data-00000-of-00001

yes.

I still haven't been able to solve or find that issue, but now when I try to resume training from checkpoint, I get this error:

tensorflow.python.framework.errors_impl.NotFoundError: FindFirstFile failed for: synthesizer/saved_models/logs-final/taco_pretrained : The system cannot find the path specified. ; No such process

after running this command

python train.py 01 --data_root Dataset/chem/ --preset synthesizer/presets/chem.json

Whenever I print vidpath from complete_test_generate.py, the output is just Dataset/, should it not be Dataset/chem/preprocessed/0010298\0010298? or is that correct. Also when printing videos is retrieved from get_testlist(args.data_root) the output is ['Dataset'].