Sigrid Jin (ง'̀-'́)ง oO's repositories
candle-vllm
Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
mpc-uniqueness-check
MPC Uniqueness Check
smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
cuda_practice
CUDA Playground
1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
chatbot-starter
Minimal NextJS chatbot starter template
ComfyUI-Docker
🐳Dockerfile for 🎨ComfyUI. | 容器镜像与启动脚本
dom-to-semantic-markdown
DOM to Semantic-Markdown for use in LLMs
ebpf_exporter
Prometheus exporter for custom eBPF metrics
freezegun
Let your Python tests travel through time
gpt_server
gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。
Liger-Kernel
Efficient Triton Kernels for LLM Training
llamatutor
An AI personal tutor built with Llama 3.1
llm-search
Querying local documents, powered by LLM
mako
An extremely fast, production-grade web bundler based on Rust.
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Minitron
A family of compressed models obtained via pruning and knowledge distillation
rank_llm
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
semantic-grep
grep for words with similar meaning to the query
sglang
SGLang is yet another fast serving framework for large language models and vision language models.
SmoothMQ
A drop-in replacement for SQS designed for great developer experience and efficiency.
spark-instructor
A library for building structured LLM responses with Spark
stable-diffusion.cpp
Stable Diffusion in pure C/C++
swiftide
Fast, streaming indexing and query library for AI (RAG) applications, written in Rust
tevatron
Tevatron - A flexible toolkit for neural retrieval research and development.
text-embeddings-inference
A blazing fast inference solution for text embeddings models