Strange output of audio (generated audio does not match input) !!

Question

Strange output of audio (generated audio does not match input) !!

samiulextreem opened this issue 2 months ago · comments

I am trying to generate voice but the generated output is strange. it has unwanted sentences in the generated audio. why is that?

this does not happen everytime through. like in the code, the 2nd audio output match wiht the input, but for 1st and 3rd audio, it does not match the input.

import torch
import ChatTTS
import torchaudio
from IPython.display import Audio
import os, sys

print(torch.cuda.is_available())



torch._dynamo.config.suppress_errors = True
torch.set_float32_matmul_precision('high')

chat = ChatTTS.Chat()


chat.load(compile=True) # Set to True for better performance

texts = [
    "So we found being competitive and collaborative was a huge way of staying motivated towards our goals, so one person to call when you fall off, one person who gets you back on then one person to actually do the activity with."
] +[
    "The Trench Crusader, a battered and scarred Leman Russ-pattern Space Wolf creaked and groaned as it emerged from the dusty, ravaged landscape of the planet Tartarus four. Captain Arcturus, a grizzled and battle-hardened Space Wolf stood at the helm, his eyes scanning the horizon for any sign of the enemy. The Crusader's hull was pockmarked with bullet holes and scorch marks, a testament to the countless battles it had fought and won. Its once-proud banner, emblazoned with the symbol of the Space Wolves, now hung limp"
]+ [
    "In the ravaged streets of the 41st millennium, the once-great city of Molech sprawled like a festering wound. The skies were perpetually shrouded in a toxic haze, and the air reeked of burning promethium and the stench of death.In this bleak world, the Imperium of Man held sway, its armies of Space Marines and Imperial Guard fighting a losing battle against the endless tide of xenos and heretics. Amidst the ruins, a lone figure emerged, clad in the battered power armor of a Imperial Guard veteran."
]

print(texts)
refined_text = chat.infer(texts, refine_text_only=True)
print(refined_text)

wavs = chat.infer(texts)
for i in range(len(wavs)):
    torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]), 24000)

源文雨 · Answer 1 · Sun Aug 18 2024 15:35:00 GMT+0800 (China Standard Time)

Try re-generating may fix the problem because this is a random event. However, if you use too long sentences, the bad probability will increase, which means that it will be more likely to happen.