huafeng's repositories
CaptchaNetTrainer
A framework for learning deep learning training testing procedures
the-Congestion-Control-Process-in-TCP-in-NS-3
Programming Assignment for CN2023: Understanding the Congestion Control Process in TCP in NS-3.
SuperServer
based on c/c++, high performance server
Linux_syscall_demo
test demo for linux syscall.
MiniGPT4-on-MLU
将MiniGPT4移植到MLU370上,可以实现多卡训练和推理功能
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
LLaMA-infer
A Inference Framework for LLaMA Models
llama-study
Inference code for Llama models
CUDA-Learn-Notes
🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
cambricon-pytorch
Build Cambricon PyTorch from source