RicherMans

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023

Language:PythonNOASSERTION1749 320

vivid

A themeable LS_COLORS generator with a rich filetype datebase

Language:RustApache-2.01613 20 64

stable-audio-tools

Generative models for conditional audio generation

Language:PythonMIT1500 42 32

AudioSep

Official implementation of "Separate Anything You Describe"

Language:PythonMIT1452 66 21

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonMIT593 16 84

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

Language:PythonMIT584 9 221

klassy

Klassy is a highly customizable binary Window Decoration, Application Style and Global Theme plugin for recent versions of the KDE Plasma desktop.

Language:C++565 9 114

BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

Language:PythonMIT294 10 25

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonMIT292 16 42

SONAR

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Language:PythonNOASSERTION279 14 13

VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Language:PythonMIT235 10 21

VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Language:Jupyter NotebookMIT197 18 22

libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Language:PythonApache-2.0146 5 4

tiny-audio-diffusion

A repository for generating and training short audio samples with unconditional waveform diffusion on accessible consumer hardware (<2GB VRAM GPU)

Language:PythonMIT133 6 3

StoryTTS

[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

Language:HTMLNOASSERTION126 18 1

VocalForge

Your one-stop solution for voice dataset creation

Language:PythonMIT101 9 12

DTTNet-Pytorch

An official implementation of the ICASSP 2024 paper: Dual-Path TFC-TDF UNet for Music Source Separation

Language:PythonApache-2.061 4 2

kaldi-hmm-gmm

Language:C++NOASSERTION25 5 1

hf_transformers_custom_model_ced

🤗 Transformers custom model for CED.

Language:PythonApache-2.05 2 1

sfi_convtasnet

Language:PythonMIT300