The purpose of this repository is to discuss on Audio transformers
Resource:
- https://til.simonwillison.net/machinelearning/musicgen
- Pop2piano https://huggingface.co/docs/transformers/model_doc/pop2piano
- audiolm-pytorch https://github.com/lucidrains/audiolm-pytorch
- Music-Genre-Classification https://github.com/yadgire7/Music-Genre-Classification
- Samples https://github.com/julianstefinovic/thesis-audio-examples
- audio-quality-assessment https://github.com/ashutoshc8101/audio-quality-assessment/tree/main/notebooks
- Voice2Summary https://github.com/alimirash/Voice2Summary/blob/main/Voice2Summary.ipynb
- HuggingSound https://github.com/jonatasgrosman/huggingsound
- audio-instrument-classification https://github.com/qthuy2k1/audio-instrument-classification
- Is it Pop or Rock? Classify songs with Hugging Face 🤗 and Ray on Vertex AI https://medium.com/google-cloud/is-it-pop-or-rock-classify-songs-with-hugging-face-and-ray-on-vertex-ai-34b3ef1175f8
Cool Git Repos
- SeisCLIP https://github.com/sixu0/SeisCLIP/tree/main/Zero_shot
- Multimodal Argumentation Mining https://github.com/StefanoColamonaco/Multimodal-AM/blob/main/main.ipynb
- Social-IQ-2.0-Multimodal-with-Emotional-Cues https://github.com/Derekxbj/Social-IQ-2.0-Multimodal-with-Emotional-Cues
Optimum Models:
- https://huggingface.co/helenai/MIT-ast-finetuned-speech-commands-v2-ov
- https://huggingface.co/docs/optimum/intel/inference#export-and-inference-of-stable-diffusion-models
- https://huggingface.co/blog/fine-tune-w2v2-bert
Speaker-Diarization
- https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization/blob/main/app.py
- https://medium.com/@pierre_guillou/speech-to-text-get-transcription-with-speakers-from-large-audio-file-in-any-language-openai-8da2312f1617
- https://github.com/piegu/language-models/blob/master/speech_to_text_transcription_with_speakers_Whisper_Transcription_%2B_NeMo_Diarization.ipynb
- https://docs.openvino.ai/2023.3/notebooks/212-pyannote-speaker-diarization-with-output.html
Resources: