demonatic's starred repositories
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
CUDATracePreload
CUDATracePreload is a dynamic tracing tool for CUDA and NCCL API calls.
Megatron-LM
Ongoing research training transformer models at scale
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
LLMs_interview_notes
该仓库主要记录 大模型(LLMs) 算法工程师相关的面试题
TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
learn-nlp-with-transformers
we want to create a repo to illustrate usage of transformers in chinese
obsidian-better-export-pdf
Obsidian PDF export enhancement plugin
flash-attention
Fast and memory-efficient exact attention
PatrickStar
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
ColossalAI
Making large AI models cheaper, faster and more accessible
obsidian-quickshare
📝 An Obsidian plugin for sharing encrypted Markdown notes on the web. Zero configuration required.
libalgebra
Fast C header-only library for popcnt, pospopcnt, and set algebraic operations