Beast code in Giters

Xumi's repositories

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonApache-2.0000

CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully ：）

Language:PythonMIT000

DIS

This is the repo for our new project Highly Accurate Dichotomous Image Segmentation

Language:PythonApache-2.0000

Eureka

Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models"

Language:Jupyter NotebookMIT000

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++Apache-2.0000

FGVC-PIM

Pytorch implementation for "A Novel Plug-in Module for Fine-Grained Visual Classification". fine-grained visual classification task.

Language:PythonMIT000

gpt4all

gpt4all: an ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue

Language:C++MIT000

how-to-optimize-gemm

RowMajor sgemm optimization

Language:C++MIT000

MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

Language:C++000

ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

NOASSERTION000

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Apache-2.0000

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache-2.0000

tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API

MIT000

yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

GPL-3.0000

XuMicoder