Zhang Cao (Tom-CaoZH)

Tom-CaoZH

Geek Repo

Company:Xidian University

Location:China

Home Page:https://tom-caozh.github.io/

Twitter:@tomcaottt

Github PK Tool:Github PK Tool

Zhang Cao's starred repositories

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:PythonStargazers:160Issues:0Issues:0

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1154Issues:0Issues:0

nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.

Language:C++License:Apache-2.0Stargazers:223Issues:0Issues:0

qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Language:PythonLicense:Apache-2.0Stargazers:266Issues:0Issues:0

MAGIS

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)

Language:PythonLicense:MITStargazers:20Issues:0Issues:0

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

License:MITStargazers:2357Issues:0Issues:0

touying

Touying is a powerful package for creating presentation slides in Typst.

Language:TypstLicense:MITStargazers:264Issues:0Issues:0

Honeycomb

Component-Model Framework in C++

Language:C++License:NOASSERTIONStargazers:44Issues:0Issues:0

foyer

Hybrid memory and disk cache in Rust

Language:RustLicense:Apache-2.0Stargazers:45Issues:0Issues:0

nimble

New file format for storage of large columnar datasets.

Language:C++License:Apache-2.0Stargazers:345Issues:0Issues:0
Language:PythonStargazers:127Issues:0Issues:0

Sequoia

scalable and robust tree-based speculative decoding algorithm

Language:PythonStargazers:265Issues:0Issues:0

TriForce

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Language:PythonStargazers:127Issues:0Issues:0

LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Language:PythonStargazers:396Issues:0Issues:0

SAS-Cache

[MSST '24] SAS-Cache: A Semantic-Aware Secondary Cache for LSM-based Key-Value Stores

License:MITStargazers:3Issues:0Issues:0

prophet-rocksdb

[MSST '24] Prophet: Optimizing LSM-Based Key-Value Store on ZNS SSDs with File Lifetime Prediction and Compaction Compensation.

Language:C++License:GPL-2.0Stargazers:4Issues:0Issues:0

DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Language:PythonLicense:Apache-2.0Stargazers:636Issues:0Issues:0

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:21374Issues:0Issues:0

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonLicense:MITStargazers:6284Issues:0Issues:0

calm

CUDA/Metal accelerated language model inference

Language:CLicense:MITStargazers:318Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:20072Issues:0Issues:0
Language:PythonStargazers:16Issues:0Issues:0

fastmoe

A fast MoE impl for PyTorch

Language:PythonLicense:Apache-2.0Stargazers:1421Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:14Issues:0Issues:0

llamafile

Distribute and run LLMs with a single file.

Language:C++License:NOASSERTIONStargazers:16103Issues:0Issues:0

MiniCPM

MiniCPM-2B: An end-side LLM outperforms Llama2-13B.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4095Issues:0Issues:0

libbf

:dart: Bloom filters for C++11

Language:C++License:BSD-3-ClauseStargazers:352Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:25Issues:0Issues:0

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:8577Issues:0Issues:0

Bamboo

Bamboo-7B Large Language Model

License:Apache-2.0Stargazers:85Issues:0Issues:0