netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

生成的语音开头有啪嗒的声音

zsanjin-p opened this issue · comments

用的是api生成的语音片段。
并不是每个生成的语音片段都有这样的啪嗒的声音,但是有不少语音片段头部,有啪嗒的一声,或者哒的一声,就像电流啪嗒一样的声音,这是什么原因?你们有这样吗?

Could you please provide more details about this issue, such as the specific text, speaker ID, and audio samples?

我也遇到了,speaker ID换成啥都不行,请帮忙看看什么问题,音频例子如下
response.zip

Could you please provide more details about this issue, such as the specific text, speaker ID, and audio samples?

我也遇到了,speaker ID换成啥都不行,请帮忙看看什么问题,音频例子如下
response.zip

When using the webpage-based demo by running streamlit run demo_page.py, the generated audio contains no noise. However, I do notice noise at the beginning of the sample audio. Can you please provide more details about this issue?

image
我用的是api的方式。以下是我的docker run命令
docker run --gpus "device=3" -d --name EmotiVoice -p 28021:8000 -v /raid/liuhao/EmotiVoice:/workspace/EmotiVoice -w /workspace/EmotiVoice/EmotiVoice emoti-voice:v1 env LANG=C.UTF-8 sh -c "uvicorn openaiapi:app --reload --host 0.0.0.0 --port 8000 >> log/all.log 2>&1"

When using the webpage-based demo by running streamlit run demo_page.py, the generated audio contains no noise. However, I do notice noise at the beginning of the sample audio. Can you please provide more details about this issue?

我也遇到了,speaker ID换成啥都不行,请帮忙看看什么问题,音频例子如下 response.zip

import os
from pydub import AudioSegment
import logging

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def remove_or_silence_noise_from_audio_files(directory, noise_duration_ms, mode):
    # Determine the output folder for processed audio files
    output_folder = os.path.join(directory, "Processed_Audio")
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
        logging.info(f"Folder created: {output_folder}")

    # Get all audio files
    audio_files = [file for file in os.listdir(directory) if file.endswith(('.mp3', '.wav'))]
    logging.info(f"Found {len(audio_files)} audio files.")

    # Initialize statistics variables
    success_count = 0
    fail_count = 0
    failed_files = []

    # Process each file
    for file in audio_files:
        file_path = os.path.join(directory, file)
        try:
            # Load the audio
            audio = AudioSegment.from_file(file_path)
            logging.info(f"Processing audio file: {file_path}")

            if mode == 1:
                # Remove noise from the beginning of the audio for noise_duration_ms milliseconds
                processed_audio = audio[noise_duration_ms:]
            elif mode == 2:
                # Create a silence segment and replace the beginning noise_duration_ms milliseconds with it
                silence = AudioSegment.silent(duration=noise_duration_ms)
                processed_audio = silence + audio[noise_duration_ms:]

            # Save the new audio file
            new_file_path = os.path.join(output_folder, file)
            processed_audio.export(new_file_path, format=file[-3:])
            logging.info(f"Processed audio file saved to: {new_file_path}")
            success_count += 1
        except Exception as e:
            logging.error(f"Error processing audio file {file_path}: {e}")
            fail_count += 1
            failed_files.append((file_path, str(e)))

    # Log the results
    logging.info(f"Processing complete. Success: {success_count}, Failures: {fail_count}")
    if fail_count > 0:
        logging.info("Failed files and reasons:")
        for file, error in failed_files:
            logging.info(f"File: {file}, Error: {error}")

if __name__ == "__main__":
    # User inputs the processing time, default is 100ms
    try:
        noise_duration_ms = int(input("Enter the noise processing time (ms, default 100ms): ") or "100")
    except ValueError:
        print("Invalid input, using default value of 100ms")
        noise_duration_ms = 100
    
    # User chooses the processing mode
    try:
        mode = int(input("Choose the mode (1: Remove beginning noise, 2: Replace beginning noise with silence): "))
        if mode not in [1, 2]:
            raise ValueError("Invalid mode, must be 1 or 2")
    except ValueError as ve:
        print(ve)
        mode = int(input("Please re-enter the correct mode (1 or 2): "))
    
    # Call the function to process audio files in the current directory
    remove_or_silence_noise_from_audio_files(os.getcwd(), noise_duration_ms, mode)