Ethan's repositories

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:PythonStargazers:0Issues:0Issues:0

actionformer_release

Code release for ActionFormer (ECCV 2022)

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

AMR-Benchmark

A Unified Implementation of Several Baseline Deep Learning Models for Automatic Modulation Recognition

Language:PythonStargazers:0Issues:1Issues:0

asr

沪语(上海话)ASR(语音识别)模型

Stargazers:0Issues:0Issues:0

audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.

License:MITStargazers:0Issues:0Issues:0

AutoX

AutoX is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.

License:Apache-2.0Stargazers:0Issues:0Issues:0

bark

🔊 Text-prompted Generative Audio Model

License:NOASSERTIONStargazers:0Issues:0Issues:0

bisheng

Bisheng is an open LLM devops platform for next generation AI applications.

License:Apache-2.0Stargazers:0Issues:0Issues:0

ctc_decoder

A ctc decoder for both online and offline asr model

Language:C++Stargazers:0Issues:1Issues:0

DecryptPrompt

总结Prompt&LLM论文,开源数据&模型,AIGC应用

Stargazers:0Issues:0Issues:0

FastASR

基于PaddleSpeech所使用的conformer模型,使用C++的高效实现模型推理,在树莓派4B等ARM平台运行也可流畅运行。

Language:C++License:Apache-2.0Stargazers:0Issues:1Issues:0
Stargazers:0Issues:0Issues:0

HierSpeechpp

The official implementation of HierSpeech++

License:NOASSERTIONStargazers:0Issues:0Issues:0

HowToLiveLonger

程序员延寿指南 | A programmer's guide to live longer

License:UnlicenseStargazers:0Issues:1Issues:0

kws

An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

License:MITStargazers:0Issues:0Issues:0

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

License:MITStargazers:0Issues:0Issues:0

LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Model for All.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

License:MITStargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

phkit

phoneme toolkit. 好用的音素处理工具箱,包含中文音素、英文音素、文本转拼音、文本正则化等模块。

License:MITStargazers:0Issues:0Issues:0

Pix2Text

Pix In, Latex & Text Out. Recognize Chinese, English Texts, and Math Formulas from Images.

License:MITStargazers:0Issues:0Issues:0

Rerender_A_Video

[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

License:NOASSERTIONStargazers:0Issues:0Issues:0

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector, Language Classifier and Spoken Number Detector

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

simple_ddp_test

toy code for ddp test

Language:PythonStargazers:0Issues:2Issues:0

SpectralCluster

Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

StyleTTS

Official Implementation of StyleTTS

License:MITStargazers:0Issues:0Issues:0

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

License:MITStargazers:0Issues:0Issues:0

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

License:NOASSERTIONStargazers:0Issues:0Issues:0