zarzen

followers

following

stars

Baltimore, MD

Organizations

ZJUT

Zhen Zhang's repositories

distributed-training

Language:Python4 20

dt-autorun

autorun distributed training experiments and gathering logs

Language:Python1 20

efa-bad-practice

Language:C++1 3 1

openssl-example1

Language:C1 30

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:C++NOASSERTION1 10

alpa

Training and serving large-scale neural networks

Language:PythonApache-2.0000

byteps

A high performance and generic framework for distributed DNN training

Language:PythonNOASSERTION010

d2l-tvm

Dive into Deep Learning Compiler

Language:Python010

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Language:PythonApache-2.0010

DeepSpeedExamples

Language:PythonMIT020

dlrm

An implementation of a deep learning recommendation model (DLRM)

Language:PythonMIT020

doom.d

doom emacs config

Language:Emacs Lisp010

efa-multithread

Language:C++020

flash-attention

Fast and memory-efficient exact attention

Language:C++Apache-2.0000

grace

GRACE - GRAdient ComprEssion for distributed deep learning

Language:PythonBSD-2-Clause000

horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Language:PythonNOASSERTION010

horovod-examples

Language:Python020

kickstart.nvim

A launch point for your personal nvim configuration

Language:LuaMIT000

Megatron-LM

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonNOASSERTION010

model-prepare

generate models to serve

Language:Python020

nccl

adding timers for NCCL

Language:C++NOASSERTION020

nccl-fastsocket

NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.

Language:C++NOASSERTION000

nccl-tests

NCCL Tests

Language:CudaBSD-3-Clause010

open_clip

An open source implementation of CLIP.

Language:Jupyter NotebookNOASSERTION000

raf

Language:C++Apache-2.0000

ratex

Yuan's fork of Ratex

Language:C++Apache-2.0000

slapo

A schedule language for progressive optimization of large deep learning model training

Language:PythonApache-2.0000

split-annotations

Source code for the split annotations project.

Language:PythonBSD-3-Clause000

UGATIT-pytorch

Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Language:PythonMIT020

zarzen.github.io

Language:HTML020