HLearning's repositories
unet_keras
unet_keras use image Semantic segmentation
DIYRefresh
下拉刷新框架
llama
Inference code for LLaMA models
mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
mlx
MLX: An array framework for Apple silicon
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
triton
Development repository for the Triton language and compiler
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
ComputeLibrary-Review
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.