AliceShen122's starred repositories

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonLicense:MITStargazers:1182Issues:0Issues:0

Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

License:Apache-2.0Stargazers:4769Issues:0Issues:0

SuperPrompt

SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.

Stargazers:4610Issues:0Issues:0

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Language:PythonLicense:MITStargazers:2946Issues:0Issues:0

VideoSys

VideoSys: An easy and efficient system for video generation

Language:PythonLicense:Apache-2.0Stargazers:1717Issues:0Issues:0

VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Language:PythonLicense:NOASSERTIONStargazers:4522Issues:0Issues:0

text-to-video-synthesis-colab

Text To Video Synthesis Colab

Language:Jupyter NotebookLicense:UnlicenseStargazers:1452Issues:0Issues:0

Hotshot-XL

✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL

Language:PythonLicense:Apache-2.0Stargazers:1051Issues:0Issues:0

Show-1

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Language:PythonLicense:NOASSERTIONStargazers:1099Issues:0Issues:0

generative-models

Generative Models by Stability AI

Language:PythonLicense:MITStargazers:24407Issues:0Issues:0

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:MITStargazers:11410Issues:0Issues:0

Text2Video-Zero

[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators

Language:PythonLicense:NOASSERTIONStargazers:4017Issues:0Issues:0

CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Language:PythonLicense:Apache-2.0Stargazers:8158Issues:0Issues:0

athena

an open-source implementation of sequence-to-sequence based speech processing engine

Language:C++License:Apache-2.0Stargazers:953Issues:0Issues:0

SenseVoice

Multilingual Voice Understanding Model

Language:PythonLicense:NOASSERTIONStargazers:3098Issues:0Issues:0

ChatTTS

A generative speech model for daily dialogue.

Language:PythonLicense:AGPL-3.0Stargazers:31657Issues:0Issues:0

fish-speech

Brand new TTS solution

Language:PythonLicense:NOASSERTIONStargazers:13477Issues:0Issues:0

ailia-models

The collection of pre-trained, state-of-the-art AI models for ailia SDK

Language:PythonStargazers:2019Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:34400Issues:0Issues:0

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonLicense:MITStargazers:3474Issues:0Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonLicense:MITStargazers:1167Issues:0Issues:0

jailbreak_llms

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).

Language:Jupyter NotebookLicense:MITStargazers:2624Issues:0Issues:0

VPTQ

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Language:PythonLicense:MITStargazers:428Issues:0Issues:0

MemGPT

Letta (fka MemGPT) is a framework for creating stateful LLM services.

Language:PythonLicense:Apache-2.0Stargazers:11996Issues:0Issues:0

self-llm

《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合**宝宝的部署教程

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8512Issues:0Issues:0

RAGChecker

RAGChecker: A Fine-grained Framework For Diagnosing RAG

Language:PythonLicense:Apache-2.0Stargazers:450Issues:0Issues:0

label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Language:JavaScriptLicense:Apache-2.0Stargazers:18915Issues:0Issues:0

ragas

Supercharge Your LLM Application Evaluations 🚀

Language:PythonLicense:Apache-2.0Stargazers:6963Issues:0Issues:0

MMA-Diffusion

[CVPR2024] MMA-Diffusion: MultiModal Attack on Diffusion Models

Language:PythonLicense:NOASSERTIONStargazers:140Issues:0Issues:0

LAION-SAFETY

An open toolbox for NSFW & toxicity detection

Language:Jupyter NotebookStargazers:49Issues:0Issues:0