Hkeee's repositories

PCEngine

[MLSys'23] Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds.

Language:CudaLicense:MITStargazers:7Issues:1Issues:0

Byte-GLM

Efficient implementation of LLM model GLM.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

ByteEngine

An LLM engine based on ByteTransformer.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

chatglm-throughput

A plugin to measure the throughput of LLMs like chatGLM.

Language:PythonStargazers:0Issues:0Issues:0

FGMS

Efficient kernel implementation for fused gather-matmul-scatter operation.

Language:CudaLicense:MITStargazers:0Issues:1Issues:0

FuseSage

Multistream accelerating strategy in GraphSage

Language:CudaStargazers:0Issues:0Issues:0

gnn-acceleration-framework-with-FPGA

including compiler to encode DGL GNN model to instructions, runtime software to transfer data and control the accelerator, and hardware verilog code that can be implemented on FPGA

Language:SystemVerilogLicense:Apache-2.0Stargazers:0Issues:0Issues:0

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:1Issues:0

sparse-op-test

Some test files to measure the performance of spmm, sddmm and spgemm on GPU.

Language:CudaStargazers:0Issues:1Issues:0

vllm_qwen

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

xformers-hacked

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0