Zhang Cao's repositories
curve
Curve is a high-performance, lightweight-operation, cloud-native open source distributed storage system. Curve can be applied to: 1) mainstream cloud-native infrastructure platforms OpenStack and Kubernetes; 2) high-performance storage for cloud-native databases; 3) cloud storage middleware using S3-compatible object storage as a data storage.
FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
paper_readings
Keep track of the papers I have read and to be read
Ditto
This is the implementation repository of our SOSP'23 paper: Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System.
LearningOS_Record
Record my daily process when learning os-comp2022-winter
LevelDBRead
To record some notes when I read the leveldb source code
LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
llama.cpp
LLM inference in C/C++
memkind
Memkind is an easy-to-use, general-purpose allocator which helps to fully utilize various kinds of memory available in the system, including DRAM, NVDIMM, and HBM
opendal
OpenDAL: Access data freely, painlessly, and efficiently
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
RocksDBRead
To record some notes when I read the rocksdb source code
runc
CLI tool for spawning and running containers according to the OCI specification
rust_study
My rust study based on the cs110l course
TensorRT
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Tom-CaoZH.github.io
This is my homepage.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
xalloc
This lib is used to allocate normal DRAM-based memory and CXL-based memory using Rust.
XD_EE_DSA_2022
my solution to XDU EE data structure and algorithm
zenfs
ZenFS is a storage backend for RocksDB that enables support for ZNS SSDs and SMR HDDs.