Beast code in Giters

DrakeYang1's starred repositories

whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Language:Jupyter NotebookBSD-2-Clause337300

awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Apache-2.0157600

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION613300

Awesome-Speaker-Diarization

Some comprehensive papers about speaker diarization

19700

Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

Language:PythonAGPL-3.03720900

Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Language:PythonApache-2.032800

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Language:PythonApache-2.0654100

Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL series by Alibaba Cloud.

Language:PythonApache-2.05700

finetune-Qwen2-VL

Language:PythonMIT10900

SillyTavern

LLM Frontend for Power Users.

Language:JavaScriptAGPL-3.0765300

KoboldAI-Client

For GGUF support, see KoboldCPP: https://github.com/LostRuins/koboldcpp

Language:PythonAGPL-3.0347700

GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Language:Python459900

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language:PythonMIT562300

Efficient-Live-Portrait

Fast running Live Portrait with TensorRT and ONNX models

Language:PythonMIT12500

Q-Align

③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

Language:PythonNOASSERTION24900

duilib

Language:C++MIT566100

twitter

AI Agent for Twitter Personality Analysis

Language:TypeScript121300

MeshAnythingV2

From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization"

Language:PythonNOASSERTION55400

torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

Language:PythonBSD-3-Clause319700

Stable-Hair

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Apache-2.034400

Husky-v1

Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and knowledge-based reasoning tasks.

Language:Python31500

exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Language:PythonGPL-3.0731400

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02754600

album-ai

AI-First Album: Chat with your gallery using plain language! LLM Vision + RAG + Album/Gallery.

Language:TypeScriptApache-2.077000

InternLM

Official release of InternLM2.5 base and chat models. 1M context support

Language:PythonApache-2.0628100

OpenGlass

Turn any glasses into AI-powered smart glasses

Language:CMIT327600

LivePortrait

Bring portraits to life!

Language:PythonNOASSERTION1201100

mem0

The Memory layer for your AI apps

Language:PythonApache-2.02201500

DrakeYang1