Yuqing's repositories

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

alphafold

Open source code for AlphaFold.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

antares

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12 and GraphCore platforms.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

chatgpt-api

Node.js client for the official ChatGPT API. 🔥

Language:TypeScriptLicense:MITStargazers:0Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0
Language:CudaLicense:MITStargazers:0Issues:0Issues:0

cuvs

cuVS - a library for vector search and clustering on the GPU

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

faiss

A library for efficient similarity search and clustering of dense vectors.

Language:C++License:MITStargazers:0Issues:0Issues:0

finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

incubator-tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

Language:C++License:MITStargazers:0Issues:0Issues:0

pytorch-lightning-transformers

Fine-tune transformers with pytorch-lightning

Language:PythonStargazers:0Issues:0Issues:0

TASO

The Tensor Algebra SuperOptimizer for Deep Learning

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

License:MITStargazers:0Issues:0Issues:0

tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

License:MITStargazers:0Issues:0Issues:0