A powerful Python-based audio processing tool that automatically removes silence from audio files and transcribes speech to text with high accuracy.
- Intelligent Silence Removal: Automatically detects and removes silence segments from audio files
- Multi-format Support: Works with WAV, MP3, M4A, OGG, and FLAC audio formats
- Speech-to-Text Transcription: Uses Google Speech Recognition for accurate transcription
- Database Integration: SQLite database stores processing metadata and transcriptions
- Duration Tracking: Monitors original vs. processed audio durations
- Configurable Parameters: Customizable silence detection thresholds
- Robust Error Handling: Comprehensive error management and validation
- Python 3.7 or higher
- Internet connection (for Google Speech Recognition)
-
Clone the repository
git clone https://github.com/Amin-moniry-pr7/Audio-Processing-Transcription.git cd Audio-Processing-Transcription
-
Install dependencies
pip install -r requirements.txt
-
Run the application
python CONVERT_AUDIO_TO_TEXT_AND_REMOVE_SILENCE.py
When you run the main script, you'll be prompted to provide:
- Audio file path: Path to your input audio file
- Output file path: Where to save the transcription (e.g.,
output.txt
) - Language code: Language for transcription (e.g.,
en-US
,de-DE
,fr-FR
) - Silence parameters:
- Minimum silence length (milliseconds, e.g.,
1000
) - Silence threshold (dB, e.g.,
-40.0
)
- Minimum silence length (milliseconds, e.g.,
Enter the path to an audio file: ./audio/interview.mp3
Enter the path to save the transcription: ./transcriptions/interview.txt
Enter the language code: en-US
Minimum silence length in milliseconds: 1000
Silence threshold in dB: -40.0
π Audio-Processing-Transcription/
βββ π CONVERT_AUDIO_TO_TEXT_AND_REMOVE_SILENCE.py # Main entry point
βββ π Database_And_prepare_audio.py # Database operations & audio preparation
βββ π Remove_silence_and_mesuere.py # Silence removal & duration measurement
βββ π Speech_and_transcribe.py # Speech-to-text processing
βββ π requirements.txt # Project dependencies
βββ π LICENSE # Apache 2.0 License
βββ π .gitignore # Git ignore rules
βββ π README.md # This file
The application creates an SQLite database (PODCAST.db
) with the following structure:
Column | Type | Description |
---|---|---|
id | INTEGER | Primary key (auto-increment) |
input_path | TEXT | Original audio file path |
output_path | TEXT | Transcription output file path |
language | TEXT | Language code used for transcription |
original_duration | REAL | Duration of original audio (seconds) |
processed_duration | REAL | Duration after silence removal |
transcription | TEXT | Full transcription text |
created_at | TIMESTAMP | Processing timestamp |
- WAV (recommended for best quality)
- MP3 (most common format)
- M4A (Apple format)
- OGG (open-source format)
- FLAC (lossless compression)
Common language codes for transcription:
en-US
- English (US)en-GB
- English (UK)de-DE
- Germanfr-FR
- Frenches-ES
- Spanishit-IT
- Italianja-JP
- Japaneseko-KR
- Korean
- Minimum Silence Length: Minimum duration (in ms) to consider as silence
- Typical range: 500-2000ms
- Lower values = more aggressive silence removal
- Silence Threshold: Volume level (in dB) below which audio is considered silence
- Typical range: -30 to -50 dB
- Lower values = more sensitive silence detection
The tool generates several files during processing:
- Processed Audio:
[original_name]_no_silence.wav
- Audio with silence removed - Transcription File: User-specified text file containing the full transcription
- Database Record: Entry in
PODCAST.db
with all processing metadata
pydub>=0.25.1
SpeechRecognition>=3.8.1
Additional system requirements:
- FFmpeg (for audio format conversion)
- Internet connection (for Google Speech Recognition API)
- Use WAV format for fastest processing (no conversion needed)
- Optimize silence parameters based on your audio content:
- Podcasts/interviews: 1000ms silence, -40dB threshold
- Music: 500ms silence, -50dB threshold
- Noisy environments: 2000ms silence, -30dB threshold
"Error: File not found"
- Verify the audio file path is correct
- Ensure the file exists and is readable
"Invalid input" for silence parameters
- Ensure minimum silence length is a positive integer
- Ensure silence threshold is a negative number
Transcription errors
- Check your internet connection
- Verify the language code is correct
- Ensure the audio quality is sufficient for recognition
Audio format not supported
- Install FFmpeg for additional format support
- Convert audio to WAV format manually if needed
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Amin Moniry
- GitHub: @Amin-moniry-pr7
- Google Speech Recognition API for transcription services
- PyDub library for audio processing
- SQLite for lightweight database functionality
β If you found this project helpful, please consider giving it a star! β