Beast code in Giters

yuguo's starred repositories

llm-inference-benchmark

LLM Inference benchmark

Language:PythonMIT30500

HIP-Performance-Optmization-on-VEGA64

14 basic topics for VEGA64 performance optmization

Language:C++4700

dlsys_solution

Homework solutions for CMU 10-414/714 – Deep Learning Systems: Algorithms and Implementation

Language:Python3900

kuiperdatawhale

Language:C++MIT19600

hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

Language:AssemblyMIT4600

amd-lab-notes

AMD lab notes with code examples to demonstrate use of AMD GPUs

Language:C++MIT8700

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonNOASSERTION177700

ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Language:PythonNOASSERTION1566400

oneflow-hip

Language:C++Apache-2.0500

ChatGLM-Efficient-Tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

Language:PythonApache-2.0363900

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.03599700

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Language:Cuda129100

FlagAI

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Language:PythonApache-2.0381400

ColossalAI

Making large AI models cheaper, faster and more accessible

Language:PythonApache-2.03843700

AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Language:Jupyter NotebookApache-2.0989800

oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

Language:C++Apache-2.0581900

Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

Language:C++Apache-2.02195000

DeepBench

Benchmarking Deep Learning operations on different hardware

Language:C++Apache-2.0106200

pytorch-cookbook

Language:Jupyter Notebook7600

rocBLAS

Next generation BLAS implementation for ROCm platform

Language:C++NOASSERTION33500

Tensile

Stretching GPU performance for GEMMs and tensor contractions.

Language:PythonMIT20400

VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library

Language:C++MIT149200

gearshifft

Benchmark Suite for Heterogenuous FFT Implementations

Language:C++Apache-2.03400

rocFFT

Next generation FFT implementation for ROCm

Language:C++NOASSERTION15500

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Language:CNOASSERTION588900

ROCm

AMD ROCm™ Software - GitHub Home

Language:ShellMIT440400

CUDA_Freshman

Language:Cuda203500

pyadi-iio

Python interfaces for ADI hardware with IIO drivers (aka peyote)

Language:PythonNOASSERTION13600

heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing

Language:PythonApache-2.032100

DeepLearningC

Simple program to learn CNN (LeNet-5) in pure C

Language:C++26600