Beast code in Giters

demonatic's starred repositories

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:Python13500

veScale

A PyTorch Native LLM Training Framework

Language:PythonApache-2.037600

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.01025600

CUDATracePreload

CUDATracePreload is a dynamic tracing tool for CUDA and NCCL API calls.

Language:C++MIT100

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION884900

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.03307900

Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Language:PythonApache-2.047500

triton

Development repository for the Triton language and compiler

Language:C++MIT1137300

einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Language:PythonMIT799100

MagnumIO

Magnum IO community repo

Language:C++Apache-2.06600

llama.cpp

LLM inference in C/C++

Language:C++MIT5868600

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonBSD-3-Clause809400

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonNOASSERTION7871400

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.0148300

LLMs_interview_notes

该仓库主要记录大模型（LLMs）算法工程师相关的面试题

Apache-2.0101400

TensorNVMe

A Python library transfers PyTorch tensors between CPU and NVMe

Language:C++8000

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonNOASSERTION773300

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonApache-2.0711300