Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training on custom data went well, but when I try to synthesise voice for new text this error occurs

Stanley80 opened this issue · comments

I training tacotron for days on my MAC OS X CPUs, but Synthesis returns the following error.

My system is
Python 3.7
TF 1.13.1
Keras 2.2.2
librosa 0.6.2

Using TensorFlow backend.
loaded model at logs-Tacotron/taco_pretrained/tacotron_model.ckpt-1800
Hyperparameters:
GL_on_GPU: False
NN_init: True
NN_scaler: 0.3
allow_clipping_in_normalization: True
attention_dim: 128
attention_filters: 32
attention_kernel: (31,)
attention_win_size: 7
batch_norm_position: after
cbhg_conv_channels: 128
cbhg_highway_units: 128
cbhg_highwaynet_layers: 4
cbhg_kernels: 8
cbhg_pool_size: 2
cbhg_projection: 256
cbhg_projection_kernel_size: 3
cbhg_rnn_units: 128
cdf_loss: False
cin_channels: 80
cleaners: basic_cleaners
clip_for_wavenet: True
clip_mels_length: True
clip_outputs: True
cross_entropy_pos_weight: 1
cumulative_weights: True
decoder_layers: 2
decoder_lstm_units: 1024
embedding_dim: 512
enc_conv_channels: 512
enc_conv_kernel_size: (5,)
enc_conv_num_layers: 3
encoder_lstm_units: 256
fmax: 6600
fmin: 55
frame_shift_ms: None
freq_axis_kernel_size: 3
gate_channels: 256
gin_channels: -1
griffin_lim_iters: 60
hop_size: 551
input_type: raw
kernel_size: 3
layers: 20
leaky_alpha: 0.4
legacy: True
log_scale_min: -32.23619130191664
log_scale_min_gauss: -16.11809565095832
lower_bound_decay: 0.1
magnitude_power: 2.0
mask_decoder: False
mask_encoder: True
max_abs_value: 4.0
max_iters: 20000
max_mel_frames: 900
max_time_sec: None
max_time_steps: 11000
min_level_db: -100
n_fft: 1100
n_speakers: 5
normalize_for_wavenet: False
num_freq: 551
num_mels: 80
out_channels: 2
outputs_per_step: 2
postnet_channels: 512
postnet_kernel_size: (5,)
postnet_num_layers: 5
power: 1.5
predict_linear: True
preemphasis: 0.97
preemphasize: True
prenet_layers: [256, 256]
quantize_channels: 65536
ref_level_db: 20
rescale: True
rescaling_max: 0.999
residual_channels: 128
residual_legacy: True
sample_rate: 44100
signal_normalization: True
silence_threshold: 2
skip_out_channels: 128
smoothing: True
speakers: ['speaker0', 'speaker1', 'speaker2', 'speaker3', 'speaker4']
speakers_path: None
split_on_cpu: True
stacks: 2
stop_at_any: True
symmetric_mels: True
synthesis_constraint: False
synthesis_constraint_type: window
tacotron_adam_beta1: 0.9
tacotron_adam_beta2: 0.999
tacotron_adam_epsilon: 1e-06
tacotron_batch_size: 32
tacotron_clip_gradients: True
tacotron_data_random_state: 1234
tacotron_decay_learning_rate: True
tacotron_decay_rate: 0.5
tacotron_decay_steps: 18000
tacotron_dropout_rate: 0.5
tacotron_final_learning_rate: 0.0001
tacotron_fine_tuning: False
tacotron_initial_learning_rate: 0.001
tacotron_natural_eval: False
tacotron_num_gpus: 1
tacotron_random_seed: 5339
tacotron_reg_weight: 1e-06
tacotron_scale_regularization: False
tacotron_start_decay: 40000
tacotron_swap_with_cpu: False
tacotron_synthesis_batch_size: 1
tacotron_teacher_forcing_decay_alpha: None
tacotron_teacher_forcing_decay_steps: 40000
tacotron_teacher_forcing_final_ratio: 0.0
tacotron_teacher_forcing_init_ratio: 1.0
tacotron_teacher_forcing_mode: constant
tacotron_teacher_forcing_ratio: 1.0
tacotron_teacher_forcing_start_decay: 10000
tacotron_test_batches: None
tacotron_test_size: 0.05
tacotron_zoneout_rate: 0.1
train_with_GTA: True
trim_fft_size: 2048
trim_hop_size: 512
trim_silence: True
trim_top_db: 40
upsample_activation: Relu
upsample_scales: [11, 25]
upsample_type: SubPixel
use_bias: True
use_lws: False
use_speaker_embedding: True
wavenet_adam_beta1: 0.9
wavenet_adam_beta2: 0.999
wavenet_adam_epsilon: 1e-06
wavenet_batch_size: 8
wavenet_clip_gradients: True
wavenet_data_random_state: 1234
wavenet_debug_mels: ['training_data/mels/mel-LJ001-0008.npy']
wavenet_debug_wavs: ['training_data/audio/audio-LJ001-0008.npy']
wavenet_decay_rate: 0.5
wavenet_decay_steps: 200000
wavenet_dropout: 0.05
wavenet_ema_decay: 0.9999
wavenet_gradient_max_norm: 100.0
wavenet_gradient_max_value: 5.0
wavenet_init_scale: 1.0
wavenet_learning_rate: 0.001
wavenet_lr_schedule: exponential
wavenet_natural_eval: False
wavenet_num_gpus: 1
wavenet_pad_sides: 1
wavenet_random_seed: 5339
wavenet_swap_with_cpu: False
wavenet_synth_debug: False
wavenet_synthesis_batch_size: 20
wavenet_test_batches: 1
wavenet_test_size: None
wavenet_warmup: 4000.0
wavenet_weight_normalization: False
win_size: 1100
Constructing model: Tacotron

Initialized Tacotron model. Dimensions (? = dynamic shape):
Train mode: False
Eval mode: False
GTA mode: False
Synthesis mode: True
Input: (?, ?)
device: 0
embedding: (?, ?, 512)
enc conv out: (?, ?, 512)
encoder out: (?, ?, 512)
decoder out: (?, ?, 80)
residual out: (?, ?, 512)
projected residual out: (?, ?, 80)
mel out: (?, ?, 80)
linear out: (?, ?, 551)
<stop_token> out: (?, ?)
Tacotron Parameters 29.023 Million.
Loading checkpoint: logs-Tacotron/taco_pretrained/tacotron_model.ckpt-1800
WARNING:tensorflow:From /Users/davidecangelosi/Desktop/workspace/venv3_01/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Starting Synthesis
0%| | 0/13 [00:00<?, ?it/s]
Traceback (most recent call last):
File "synthesize.py", line 100, in
main()
File "synthesize.py", line 90, in main
_ = tacotron_synthesize(args, hparams, taco_checkpoint, sentences)
File "/Users/davidecangelosi/Desktop/workspace/venv3_01/Tacotron-2/tacotron/synthesize.py", line 136, in tacotron_synthesize
return run_eval(args, checkpoint_path, output_dir, hparams, sentences)
File "/Users/davidecangelosi/Desktop/workspace/venv3_01/Tacotron-2/tacotron/synthesize.py", line 69, in run_eval
mel_filenames, speaker_ids = synth.synthesize(texts, basenames, eval_dir, log_dir, None)
File "/Users/davidecangelosi/Desktop/workspace/venv3_01/Tacotron-2/tacotron/synthesizer.py", line 219, in synthesize
audio.save_wav(wav, os.path.join(log_dir, 'wavs/wav-{}-mel.wav'.format(basenames[i])), sr=hparams.sample_rate)
File "/Users/davidecangelosi/Desktop/workspace/venv3_01/Tacotron-2/datasets/audio.py", line 13, in save_wav
wav *= 32767 / max(0.01, np.max(np.abs(wav)))
File "/Users/davidecangelosi/Desktop/workspace/venv3_01/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2320, in amax
out=out, **kwargs)
File "/Users/davidecangelosi/Desktop/workspace/venv3_01/lib/python3.7/site-packages/numpy/core/_methods.py", line 26, in _amax
return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity

I quite frustrated from weeks of attempts to solve this issue.

Can someone help me ?

Thank you very much

I read that \ufeff is the BOM or "Byte Order Mark".