Rui Wang's repositories
Deep-Approximate-Shapley-Propagation
This is a Pytorch Implementation of the DASP algorithm from the paper "Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation"
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
flash-attention
Fast and memory-efficient exact attention
google-research
Google Research forked
Megatron-LM
Ongoing research training transformer models at scale
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.