AI-X-King

0

followers

0

following

stars

AI-X-King's repositories

AlpacaDataCleaned

Alpaca dataset from Stanford, cleaned and curated

Language:PythonApache-2.0000

apps

one benchmark for llm coding

Language:PythonMIT000

awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Apache-2.0000

brouhaha-vad

Language:Jupyter Notebook000

axolotl

Go ahead and axolotl questions

Apache-2.0000

CodeFormer

[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

Language:PythonNOASSERTION000

data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

Apache-2.0000

data_management_LLM

Collection of training data management explorations for large language models

000

datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Apache-2.0000

DENT_DDSP

000

DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

CC-BY-4.0000

ECAPA-TDNN

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

MIT000

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++Apache-2.0000

lhotse

Tools for handling speech data in machine learning projects.

Apache-2.0000

LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Apache-2.0000

llama.cpp

LLM inference in C/C++

MIT000

promptbase

All things prompt engineering

MIT000

PSST

Prosodic Speech Segmentation with Transformers

MIT000

pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.

Language:PythonApache-2.0000

pytorch-docker

Pure Pytorch Docker Images.

Language:ShellApache-2.0000

RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Apache-2.0000

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language:C++Apache-2.0000

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Apache-2.0000

sherpa

Streaming and non-streaming ASR server for next-gen Kaldi

Language:PythonNOASSERTION000

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Apache-2.0000

vad-1-without-upload

000

VAD-with-adversarial-multi-task-learning

Language:Python000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache-2.0000

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:Jupyter NotebookMIT000

whisper-finetune

Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.

Language:PythonMIT000