neos's repositories
tflite-micro
TensorFlow Lite for Microcontrollers
bitsandbytes
8-bit CUDA functions for PyTorch
composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
crack_leetcode
五天刷题,三天模拟!快速掌握leetcode解题套路!
CUDA-Programming
Sample codes for my CUDA programming book
firesim-nvdla
FireSim-NVDLA: NVIDIA Deep Learning Accelerator (NVDLA) Integrated with RISC-V Rocket Chip SoC Running on the Amazon FPGA Cloud
GPTQ-for-LLaMa
4 bits quantization of LLaMa using GPTQ
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
ITRI-OpenDLA
OpenDLA for trying the demo and FPGA solution
llama-int8
Quantized inference code for LLaMA models
llama.onnx
llama onnx models and onnxruntime demo
mlir-emitc
Conversions to MLIR EmitC
qlib
Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment. With Qlib, you can easily try your ideas to create better Quant investment strategies. An increasing number of SOTA Quant research works/papers are released in Qlib.
relay-bench
A repository containing examples and benchmarks for Relay.
relay-mlir
An MLIR-based toy DL compiler for TVM Relay.
tinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; MCUNetV3: On-Device Training Under 256KB Memory