生成的语音开头有啪嗒的声音

Question

生成的语音开头有啪嗒的声音

zsanjin-p opened this issue 4 months ago · comments

zsanjin-p commented 4 months ago

用的是api生成的语音片段。
并不是每个生成的语音片段都有这样的啪嗒的声音，但是有不少语音片段头部，有啪嗒的一声，或者哒的一声，就像电流啪嗒一样的声音，这是什么原因？你们有这样吗？

Yanqing Sun · Answer 1 · Mon Jan 29 2024 10:33:01 GMT+0800 (China Standard Time)

Could you please provide more details about this issue, such as the specific text, speaker ID, and audio samples?

lh7343 · Answer 2 · Thu Mar 21 2024 19:16:29 GMT+0800 (China Standard Time)

我也遇到了，speaker ID换成啥都不行，请帮忙看看什么问题，音频例子如下
response.zip

lh7343 · Answer 3 · Thu Mar 21 2024 19:16:47 GMT+0800 (China Standard Time)

Could you please provide more details about this issue, such as the specific text, speaker ID, and audio samples?

我也遇到了，speaker ID换成啥都不行，请帮忙看看什么问题，音频例子如下
response.zip

Yanqing Sun · Answer 4 · Tue Mar 26 2024 11:10:02 GMT+0800 (China Standard Time)

When using the webpage-based demo by running streamlit run demo_page.py, the generated audio contains no noise. However, I do notice noise at the beginning of the sample audio. Can you please provide more details about this issue?

lh7343 · Answer 5 · Tue Apr 02 2024 16:17:22 GMT+0800 (China Standard Time)

我用的是api的方式。以下是我的docker run命令
docker run --gpus "device=3" -d --name EmotiVoice -p 28021:8000 -v /raid/liuhao/EmotiVoice:/workspace/EmotiVoice -w /workspace/EmotiVoice/EmotiVoice emoti-voice:v1 env LANG=C.UTF-8 sh -c "uvicorn openaiapi:app --reload --host 0.0.0.0 --port 8000 >> log/all.log 2>&1"

When using the webpage-based demo by running streamlit run demo_page.py, the generated audio contains no noise. However, I do notice noise at the beginning of the sample audio. Can you please provide more details about this issue?

zsanjin-p · Answer 6 · Sun Apr 21 2024 01:32:34 GMT+0800 (China Standard Time)

我也遇到了，speaker ID换成啥都不行，请帮忙看看什么问题，音频例子如下 response.zip

import os
from pydub import AudioSegment
import logging

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def remove_or_silence_noise_from_audio_files(directory, noise_duration_ms, mode):
    # Determine the output folder for processed audio files
    output_folder = os.path.join(directory, "Processed_Audio")
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
        logging.info(f"Folder created: {output_folder}")

    # Get all audio files
    audio_files = [file for file in os.listdir(directory) if file.endswith(('.mp3', '.wav'))]
    logging.info(f"Found {len(audio_files)} audio files.")

    # Initialize statistics variables
    success_count = 0
    fail_count = 0
    failed_files = []

    # Process each file
    for file in audio_files:
        file_path = os.path.join(directory, file)
        try:
            # Load the audio
            audio = AudioSegment.from_file(file_path)
            logging.info(f"Processing audio file: {file_path}")

            if mode == 1:
                # Remove noise from the beginning of the audio for noise_duration_ms milliseconds
                processed_audio = audio[noise_duration_ms:]
            elif mode == 2:
                # Create a silence segment and replace the beginning noise_duration_ms milliseconds with it
                silence = AudioSegment.silent(duration=noise_duration_ms)
                processed_audio = silence + audio[noise_duration_ms:]

            # Save the new audio file
            new_file_path = os.path.join(output_folder, file)
            processed_audio.export(new_file_path, format=file[-3:])
            logging.info(f"Processed audio file saved to: {new_file_path}")
            success_count += 1
        except Exception as e:
            logging.error(f"Error processing audio file {file_path}: {e}")
            fail_count += 1
            failed_files.append((file_path, str(e)))

    # Log the results
    logging.info(f"Processing complete. Success: {success_count}, Failures: {fail_count}")
    if fail_count > 0:
        logging.info("Failed files and reasons:")
        for file, error in failed_files:
            logging.info(f"File: {file}, Error: {error}")

if __name__ == "__main__":
    # User inputs the processing time, default is 100ms
    try:
        noise_duration_ms = int(input("Enter the noise processing time (ms, default 100ms): ") or "100")
    except ValueError:
        print("Invalid input, using default value of 100ms")
        noise_duration_ms = 100
    
    # User chooses the processing mode
    try:
        mode = int(input("Choose the mode (1: Remove beginning noise, 2: Replace beginning noise with silence): "))
        if mode not in [1, 2]:
            raise ValueError("Invalid mode, must be 1 or 2")
    except ValueError as ve:
        print(ve)
        mode = int(input("Please re-enter the correct mode (1 or 2): "))
    
    # Call the function to process audio files in the current directory
    remove_or_silence_noise_from_audio_files(os.getcwd(), noise_duration_ms, mode)