Youhe-Jiang

Youhe Jiang's repositories

IJCAI2023-OptimalShardedDataParallel

[IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any interests, please visit/star/fork https://github.com/Youhe-Jiang/OptimalShardedDataParallel

Language:PythonMIT50 10

Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Language:PythonApache-2.0200

alpa

Training and serving large-scale neural networks with auto parallelization.

Language:PythonApache-2.0000

annotation

010

cutlass

CUDA Templates for Linear Algebra Subroutines

NOASSERTION000

Decentralized-FM-Inference

010

Decentralized_FM_alpha

000

DeepRL_PyTorch

Deep Reinforcement Learning codes for study. Currently, there are only codes for algorithms: DQN, C51, QR-DQN, IQN, QUOTA.

Language:PythonApache-2.0000

examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

BSD-3-Clause000

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++Apache-2.0000

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

FlexFlow

A distributed deep learning framework that supports flexible parallelization strategies.

Apache-2.0000

FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Apache-2.0000

Hetu-Galvatron

Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs). If you have any interests, please visit/star/fork https://github.com/PKU-DAIR/Hetu-Galvatron

000

HexGen

Serving LLMs on heterogeneous decentralized clusters.

Language:PythonApache-2.0000

llama

Inference code for LLaMA models

GPL-3.0000

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonNOASSERTION000

mms

Multi model serving

Language:Python000

MS-AMP

Microsoft Automatic Mixed Precision Library

MIT000

MS-AMP-Examples

Examples for MS-AMP package.

MIT000

ninja

a small build system with a focus on speed

Apache-2.0000

pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch

Language:PythonNOASSERTION000

Pytorch-UNet

PyTorch implementation of the U-Net for image semantic segmentation with high quality images

GPL-3.0000

saved_model

000

t5_small

010

Tensor-Parallel

Language:Python010

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Apache-2.0000

yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

GPL-3.0000

Youhe-Jiang

Config files for my GitHub profile.

010

youhe-jiang.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT000