speech-translation

There are 7 repositories under speech-translation topic.

NVIDIA / NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
asr deeplearning generative-ai large-language-models machine-translation multimodal neural-networks speaker-diariazation speaker-recognition speech-synthesis speech-translation tts
Language:Python 13377
PaddlePaddle / PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper
Language:Python 11658
espnet / espnet
End-to-End Speech Processing Toolkit
chainer deep-learning end-to-end kaldi machine-translation pytorch singing-voice-synthesis speaker-diarization speech-enhancement speech-recognition speech-separation speech-synthesis speech-translation spoken-language-understanding text-to-speech voice-conversion
Language:Python 8881
huggingface / speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
ai assistant language-model machine-learning python speech speech-synthesis speech-to-text speech-translation
Language:Python 3898
microsoft / SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
speech-pretraining speech-recognition speech-synthesis speech-text-pretraining speech-translation speech2c speechlm speecht5 speechut vallex vatlm
Language:Python 1318
ictnlp / StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
all-in-one asr audio-processing machine-translation non-autoregressive seamless simultaneous-translation speech speech-enhancement speech-processing speech-recognition speech-synthesis speech-to-text speech-translation streaming-audio text-to-audio text-to-speech translation tts voice
Language:Python 1040
zhangshaolei1998 / Awesome-Simultaneous-Translation
Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
machine-translation simultaneous-translation natural-language-processing speech-translation text-translation awesome paper paperlist nlp simultaneous-machine-translation streaming
574
Dadangdut33 / Speech-Translate
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
python speech-transcription speech-translation tkinter-python translate whisper
Language:Python 558
double22a / speech_dataset
The dataset of Speech Recognition
asr audio automatic-speech-recognition dataset deep-learning deep-neural-networks speech speech-diarization speech-enhancement speech-recognition speech-segmentation speech-separation speech-synthesis speech-to-text speech-translation text-to-speech tts voice-conversion wav
408
echogarden-project / echogarden
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice isolation, language detection and more.
command-line forced-alignment language-detection language-identification node-js source-separation speech speech-alignment speech-recognition speech-synthesis speech-to-text speech-translation text-to-speech voice-isolation
Language:TypeScript 326
kahne / SpeechTransProgress
Tracking the progress in end-to-end speech translation
artificial-intelligence machine-translation natural-language-generation natural-language-processing speech-processing speech-translation spoken-language-processing spoken-language-translation
260
MooreThreads / MooER
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
chatgpt gpt-4o large-language-models speech-interaction speech-recognition speech-to-speech speech-to-text speech-translation
Language:Python 197
dqqcasia / awesome-speech-translation
cascaded-speech-translation disfluency-detection machine-translation multimodal-machine-learning multimodal-machine-translation natural-language-processing non-autoregressive-translation punctuation-restoration simultaneous-translation speech speech-processing speech-recognition speech-synthesis speech-to-speech speech-to-subtitles speech-translation text-translation
177
bzhangGo / zero
Zero -- A neural machine translation system
aan adaptive-feature-selection average-attention-network deep-transformer depth-scaled-initialization fast-bidirectional-decoder l0drop massively-multilingual-translation neural-machine-translation opus-100 speech-translation transformer
Language:Python 150
ReneeYe / ConST
code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)
contrastive-learning machine-translation naacl2022 neural-machine-translation pytorch speec speech-translation spoken-language-processing transformer translation
Language:Python 64
ictnlp / DASpeech
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
machine-translation speech-to-speech speech-to-speech-translation speech-translation
Language:Python 61
hlt-mt / FBK-fairseq
Repository containing the open source code of works published at the FBK MT unit.
deep-learning gender-bias pytorch simultaneous-translation speech-to-text speech-translation subtitling
Language:Python 42
mt-upc / SHAS
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
audio-segmentation speech-translation speech-to-text speech wav2vec2
Language:Python 38
Rongjiehuang / awesome-speech-to-speech-translation
List of direct speech-to-speech translation papers.
speech-translation s2st speech-to-speech-translation awesome awesome-list
37
ictnlp / STEMM
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".
machine-translation speech-to-text speech-translation
Language:Python 36
George0828Zhang / torch_cif
A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.
pytorch cif continuous-integrate-and-fire alignment asr automatic-speech-recognition monotonic speech speech-recognition speech-to-text speech-translation torch
Language:Python 33
ictnlp / DiSeg
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"
machine-translation segment segmentation sequence-segmentation simultaneous-machine-translation simultaneous-translation speech speech-translation streaming streaming-speech-to-text
Language:Python 33
liamdugan / speech-to-speech
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
simultaneous-translation speech speech-processing speech-to-speech speech-translation
Language:Python 30
George0828Zhang / simulst
PyTorch toolkit for streaming speech recognition, speech translation and simultaneous translation based on fairseq.
pytorch streaming speech-recognition speech-to-text speech-translation simultaneous-translation
Language:Python 25
KevKibe / African-Whisper
🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
asr speech speech-recognition speech-to-text speech-transcription speech-translation whisper
Language:Python 25
mt-upc / ZeroSwot
Pushing the Limits of Zero-shot End-to-End Speech Translation
speech-translation translation
Language:Python 25
ictnlp / ComSpeech
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".
machine-translation non-autoregressive-translation speech-to-speech-translation speech-translation text-to-speech zero-shot-speech-translation
Language:Python 24
ReneeYe / XSTNet
This is an implementation of paper "End-to-end Speech Translation via Cross-modal Progressive Training" (Interspeech2021)
machine-translation neural-machine-translation speech-recognition speech-translation spoken-language-processing tensorflow2 interspeech2021
Language:Python 20
VinAIResearch / PhoST
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation (INTERSPEECH 2022)
benchmark-dataset english english-to-vietnamese machine-translation phost speech-translation vietnamese
20
ictnlp / CRESS
Code for ACL 2023 main conference paper "Understanding and Bridging the Modality Gap for Speech Translation".
machine-translation speech-to-text speech-translation
Language:Python 17
ictnlp / ITST
Code for EMNLP 2022 main conference paper "Information-Transport-based Policy for Simultaneous Translation"
end-to-end-speech-translation machine-translation simultaneous-machine-translation simultaneous-translation speech-translation
Language:Python 14
ictnlp / BT4ST
Code for ACL 2023 main conference paper "Back Translation for Speech-to-text Translation Without Transcripts".
machine-translation speech-to-text speech-translation
Language:Python 13
bzhangGo / st_from_scratch
Revisiting End-to-End Speech-to-Text Translation From Scratch
speech-to-text-translation end-to-end-speech-translation speech-translation speech-translation-from-scratch
Language:Python 12
IESTAC
Giuseppe-Della-Corte / IESTAC
A corpus that can be used to train English-to-Italian End-to-End Speech-to-Text Machine Translation models
machine-translation speech-translation corpus parallel-corpus parallel-corpora end-to-end-machine-learning forced-alignment speech-processing mfcc-features bitext sentence-embeddings sentence-similarity statistical-machine-translation speech-recognition text-processing text-preprocessinig web-scraping named-entity-recognition audio-data sql-database
11
JeffWang0325 / Microsoft-Azure-Cognitive-Services
🖍️ This project combines multiple operations in Microsoft Azure Cognitive Services into one GUI, including QnA Maker, LUIS, Computer Vision, Custom Vision, Face, Form Recognizer, Text To Speech, Speech To Text and Speech Translation. It's very user-friendly for users to implement any operation mentioned above.
microsoft azure cognitive-services computer-vision luis luis-ai qnamaker qna-maker speech-recognition speech-to-text speech-synthesis text-to-speech translation speech-translation customvision face-recognition face-detection face formrecognizer
Language:C# 11
xuchennlp / S2T
The project for speech translation
speech-translation speech-recognition speech-to-text
Language:Python 11