Kane's repositories
applegpu
Apple G13 GPU architecture docs and tools
ArchProbe
A profiler to disclose and quantify hardware features on GPUs.
asitop
Perf monitoring CLI tool for Apple Silicon
AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
cformers
SoTA Transformers with C-backend for fast inference on your CPU.
gpt4-pdf-chatbot-langchain
GPT4 & LangChain Chatbot for large PDF docs
HelloSilicon
An introduction to ARM64 assembly on Apple Silicon Macs
incubator-tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
insn_bench_aarch64
Instruction latency & throughput profiler for AArch64
llama.cpp
Port of Facebook's LLaMA model in C/C++
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.
LLVM_for_cpu0
This is a tutorial to learn LLVM, I realize a backend to compiler machine code for cpu0 which is a simple RISC cpu.
ml-compiler-opt
Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
MMdnn
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
gpu-benches
collection of benchmarks to measure basic GPU capabilities
langchain
⚡ Building applications with LLMs through composability ⚡
llm-viz
3D Visualization of an GPT-style LLM
Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
MOSS
An open-source tool-augmented conversational language model from Fudan University
netron
Visualizer for neural network, deep learning and machine learning models
NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
stf
Control and manage Android devices from your browser.
XiangShan
Open-source high-performance RISC-V processor
XiangShan-doc
Documentation for XiangShan