Yanming W.'s repositories
transformers
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
alpa
Training and serving large-scale neural networks
ColossalAI
Making large AI models cheaper, faster and more accessible
ColossalAI-Documentation
Documentation for Colossal-AI
compiler-explorer
Run compilers interactively from your web browser and interact with the assembly
detr
End-to-End Object Detection with Transformers
djl-serving
A universal scalable machine learning model deployment solution
flash-attention
Fast and memory-efficient exact attention
llama.cpp
Port of Facebook's LLaMA model in C/C++
llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
lm-evaluation-harness
A framework for few-shot evaluation of language models.
maskrcnn-benchmark
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
PipeEdge
PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices
tensorflow-fork
An Open Source Machine Learning Framework for Everyone
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (ggml/gguf), Llama models.
vllm-test
Misc test and benchmark code for vllm