Kai Sun's repositories
InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
llama.cpp
LLM inference in C/C++
mlc-llm
Universal LLM Deployment Engine with ML Compilation
Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
AIOS
AIOS: LLM Agent Operating System
chatglm.cpp
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4
chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU)
ComfyUI
The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
cuvs
cuVS - a library for vector search and clustering on the GPU
enchanted
Enchanted is iOS and macOS app for chatting with private self hosted language models such as Llama2, Mistral or Vicuna using Ollama.
face-detection-tflite
Face and iris detection for Python based on MediaPipe
fish-speech
Brand new TTS solution
hagrid
HAnd Gesture Recognition Image Dataset
Kolors
Kolors Team
mediapipe-hand-crop-fix
Code for "Optimizing Hand Area Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Prevent Downstream Errors"
MiniCPM-V
MiniCPM-V 2.0: An Efficient End-side MLLM with Strong OCR and Understanding Capabilities
pytorch-forecasting
Time series forecasting with PyTorch
raft
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
SandDance
Visually explore, understand, and present your data.
swift
ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Vary-tiny-600k
Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)