wav2vec2

There are 4 repositories under wav2vec2 topic.

PaddlePaddle / PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr kws speech-recognition sound-classification voice-cloning vocoder voice-recognition self-supervised-learning wav2vec2 whisper code-switch
Language:Python 12345
s3prl
s3prl / s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
speech-representation mockingjay representation-learning apc tera self-supervised-learning speech-pretraining vq-apc wav2vec vq-wav2vec wav2vec2 cpc pase decoar hubert distilhubert wavlm unispeech-sat decoar2 data2vec
Language:Python 2470
audeering / w2v2-how-to
How to use our public wav2vec2 dimensional emotion model
speech-emotion-recognition deep-learning wav2vec2 transformer-models arousal dominance valence msp-podcast onnx
Language:Jupyter Notebook 529
oliverguhr / wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
speech-recognition wav2vec2 pyaudio wav2vec speech-to-text asr speech
Language:Python 373
vid2cleantxt
pszemraj / vid2cleantxt
Python API & command-line tool to easily transcribe speech-based video files into clean text
audio audio-processing keyword keyword-extraction nlp python sentence sentence-boundary-detection speech speech-recognition speech-to-text spelling-correction transcription transformer video video-processing video-summarisation video-summarization wav2vec2 whisper
Language:Jupyter Notebook 215
inboxpraveen / LLM-Minutes-of-Meeting
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀
huggingface huggingface-transformers llm llm-inference meeting-minutes minutes-of-meeting natural-language-processing nlp python speech-recognition speech-to-text transformers translation wav2vec2 web web-application webapplication whisper whisper-ai
Language:Python 155
khanld / ASR-Wav2vec-Finetune
:zap: Finetune Wa2vec 2.0 For Speech Recognition
asr pytorch speech-recognition wav2vec2 finetune-wav2vec huggingface speech-to-text vietnamese-speech-recognition
Language:Python 141
habla-liaa / ser-with-w2v2
Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'
speech-emotion-recognition wav2vec2 deep-learning tensorflow speech
Language:Jupyter Notebook 137
ASR
vietai / ASR
End-to-End Vietnamese Speech Recognition using wav2vec 2.0
asr asr-model wav2vec2 ctc-loss pretrained-weights end-to-end-speech-recognition
103
tuanio / noisy-student-training-asr
Pytorch implementation of Noisy Student Training for Automatic Speech Recognition and Automatic Pronunciation Error Detection problem
aped conformer data-augmentation deep-learning machine-learning noisy-student nst pretrained pytorch semi-supervised-learning speech-recognition wav2vec2
Language:Python 97
thevasudevgupta / gsoc-wav2vec2
GSoC'2021 | TensorFlow implementation of Wav2Vec2
gsoc tensorflow wav2vec2 speech-to-text librispeech-dataset
Language:Jupyter Notebook 90
Telegram-Zalo / zac2022-lyric-alignment
Solution for Zalo AI Challenge 2022 - Lyrics Alignment
deep-learning dynamic-programming forced-alignment pytorch wav2vec2 music-alignment vietnamese
Language:Python 68
mikezzb / lyrics-sync
A deep learning lyrics-to-audio alignment system, generating synchronized lyrics from a song and its lyrics
ai deep-learning demucs jupyter-notebook lyrics machine-learning music music-information-retrieval python wav2vec2
Language:Jupyter Notebook 56
khanld / Wav2vec2-Pretraining
Wav2vec 2.0 Self-Supervised Pretraining
wav2vec2 pretraining self-supervised speech-processing asr contrastive-learning quantization speech-recognition speech-to-text
Language:Python 55
lstrgar / self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
hubert speech-segmentation wav2vec2 deep-learning self-supervised-learning speech-technology
Language:Python 55
mmakiuchi / multimodal_emotion_recognition
Scripts used in the research described in the paper "Multimodal Emotion Recognition with High-level Speech and Text Features" accepted in the ASRU 2021 conference.
emotion-recognition speech-emotion-recognition text-emotion-detection wav2vec2 disentanglement-learning asru2021
Language:Python 53
MiniASR
vectominist / MiniASR
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
asr ctc speech-recognition speech-representation wav2vec2 hubert minimal pytorch fairseq s3prl
Language:Jupyter Notebook 53
HarunoriKawano / Wav2vec2.0
Implementation of the paper "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations" in Pytorch.
pytorch speech-recognition wav2vec2
Language:Python 52
pooya-mohammadi / audio-classification-pytorch
In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other audio classification task by simply changing the number of classes and the input dataset.
audio-classification deep-learning deep-utils python pytorch lstm transformers wav2vec2
Language:Jupyter Notebook 44
mt-upc / SHAS
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
audio-segmentation speech-translation speech-to-text speech wav2vec2
Language:Python 40
ECNU-Cross-Innovation-Lab / ShiftSER
[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
hubert speech-emotion-recognition wav2vec2
Language:Python 39
ttop32 / wav2vec2-live-japanese-translator
real time japanese speech recognition translator using wav2vec2
speech-to-text automatic-speech-recognition asr stt wav2vec2 japanese translator pyqt5 voice-recognition voice real-time live huggingface pytorch audio spoken-language-understanding translation speaker-recognition pyaudio fine-tuning
Language:Jupyter Notebook 39
Hamtech-ai / wav2vec2-fa
fine-tune Wav2vec2. an ASR model released by Facebook
asr asr-model huggingface nlp speech-to-text transformer wav2vec2
Language:Jupyter Notebook 38
lucasgris / wav2vec4bp
Wav2vec resources and models for Brazilian Portuguese
wav2vec2 brazilian-portuguese portuguese wav2vec automatic-speech-recognition dataset speech-to-text
Language:Jupyter Notebook 35
hammaad2002 / ASRAdversarialAttacks
An ASR (Automatic Speech Recognition) adversarial attack repository.
adversarial-attacks adversarial-machine-learning asr carlini-wagner carlini-wagner-attack fgsm-attack pgd-adversarial-attacks pgd-attack projected-gradient-desent wav2vec2 huggingface huggingface-library huggingface-transformer huggingface-transformers transformers transformers-library transformers-model transformers-models
Language:Jupyter Notebook 33
mpoyraz / wav2vec2-turkish
Turkish Speech Recognition using Facebook's Wav2vec 2.0 models
speech-recognition wav2vec2 speech-to-text asr turkish
Language:Python 31
egorsmkv / asr-corpus-creator
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
asr audio audio-processing automatic-speech-recognition nemo wav2vec2 speech-recognition whisper
Language:Python 27
ECNU-Cross-Innovation-Lab / ENT
[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
automatic-speech-recognition speech-emotion-recognition wav2vec2
Language:Python 25
daanzu / wav2vec2_stt_python
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition
speech-recognition speech-to-text speech python pytorch wav2vec2 wav2vec
Language:Python 24
yamahigashi / Wav2Vec2FBX
Recognize speech from an audio file and convert it into animation FBX
wav2vec2 animation lipsync audio
Language:Python 24
JuJu2181 / Automatic-Nepali-Speech-Recognition-and-Summarizer
A system capable of converting Nepali speech to text and generate summary of text
abstractive-summarization deep-learning extractive-summarization machine-learning nepali nepali-nlp python speech-recognition wav2vec2 cnn-resnet-bilstm nepali-speech nepali-summary
Language:Jupyter Notebook 21
AmirAbaskohi / Automatic-Speech-recognition-for-Speech-Assessment-of-Persian-Preschool-Children
Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.
asr speech-recognition wav2vec2 dataset deep-learning
Language:Jupyter Notebook 20
Sreyan88 / Toxicity-Detection-in-Spoken-Utterances
This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"
speech speech-classification toxicity-classification wav2vec2
Language:Jupyter Notebook 19
skit-ai / Map-Mix
The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at ICASSP-2023)
datamaps hubert language-identification mixup speech-processing spoken-language-identification spoken-language-recognition wav2vec2 xlsr confidence-labels
18
WOLOF-ASR-Wav2Vec2
kingabzpro / WOLOF-ASR-Wav2Vec2
Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.
asr-model wav2vec2 wolof africa audio-processing audio facebook transcription
Language:Jupyter Notebook 17
jmaczan / asr-dysarthria
Research on Automatic Speech Recognition for dysarthric speech
asr automatic-speech-recognition dysarthria dysarthric-speech wav2vec2 deep-learning self-supervised-learning
Language:Jupyter Notebook 15

wav2vec2

PaddlePaddle / PaddleSpeech

s3prl / s3prl

audeering / w2v2-how-to

oliverguhr / wav2vec2-live

pszemraj / vid2cleantxt

inboxpraveen / LLM-Minutes-of-Meeting

khanld / ASR-Wav2vec-Finetune

habla-liaa / ser-with-w2v2

vietai / ASR

tuanio / noisy-student-training-asr

thevasudevgupta / gsoc-wav2vec2

Telegram-Zalo / zac2022-lyric-alignment

mikezzb / lyrics-sync

khanld / Wav2vec2-Pretraining

lstrgar / self-supervised-phone-segmentation

mmakiuchi / multimodal_emotion_recognition

vectominist / MiniASR

HarunoriKawano / Wav2vec2.0

pooya-mohammadi / audio-classification-pytorch

mt-upc / SHAS

ECNU-Cross-Innovation-Lab / ShiftSER

ttop32 / wav2vec2-live-japanese-translator

Hamtech-ai / wav2vec2-fa

lucasgris / wav2vec4bp

hammaad2002 / ASRAdversarialAttacks

mpoyraz / wav2vec2-turkish

egorsmkv / asr-corpus-creator

ECNU-Cross-Innovation-Lab / ENT

daanzu / wav2vec2_stt_python

yamahigashi / Wav2Vec2FBX

JuJu2181 / Automatic-Nepali-Speech-Recognition-and-Summarizer

AmirAbaskohi / Automatic-Speech-recognition-for-Speech-Assessment-of-Persian-Preschool-Children

Sreyan88 / Toxicity-Detection-in-Spoken-Utterances

skit-ai / Map-Mix

kingabzpro / WOLOF-ASR-Wav2Vec2

jmaczan / asr-dysarthria