Clay's repositories
fast-llm-inference
Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.
gan-mnist-pytorch-implemented
A simple test for GAN
font-to-png
Use font file to plot a character image.
highlight_code_convert_html
A simple script to convert your code to highlight code, and display as html format.
unity-snake
A mobile version of classic snake game and it can move any direction
ai-remove-background-website
Using AI model for background removing
llm-kernel-foundry
Optimized CUDA Kernels
FlagEmbedding
Dense Retrieval and Retrieval-augmented LLMs
github-dir-dl
This is a command-line tool to download GitHub directories
github-readme-stats
:zap: Dynamically generated stats for your github readmes
Latent-Self-Reflection-Model
A novel train method for improve model instruction following and reduce hallucinate
sglang
SGLang is a fast serving framework for large language models and vision language models.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
TimeScheduler
This is a time scheduling application developed using React.js for the front-end and Python Flask for the back-end.
trl
Train transformer language models with reinforcement learning.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
website_crawler
It is a repository that records some website I crawled