demonatic's starred repositories

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:PythonStargazers:135Issues:0Issues:0

veScale

A PyTorch Native LLM Training Framework

Language:PythonLicense:Apache-2.0Stargazers:376Issues:0Issues:0

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:10256Issues:0Issues:0

CUDATracePreload

CUDATracePreload is a dynamic tracing tool for CUDA and NCCL API calls.

Language:C++License:MITStargazers:1Issues:0Issues:0

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:8849Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:33079Issues:0Issues:0

Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:475Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:11373Issues:0Issues:0

einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Language:PythonLicense:MITStargazers:7991Issues:0Issues:0

MagnumIO

Magnum IO community repo

Language:C++License:Apache-2.0Stargazers:66Issues:0Issues:0

llama.cpp

LLM inference in C/C++

Language:C++License:MITStargazers:58686Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:6894Issues:0Issues:0

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonLicense:BSD-3-ClauseStargazers:8094Issues:0Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonLicense:NOASSERTIONStargazers:78714Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1483Issues:0Issues:0

LLMs_interview_notes

该仓库主要记录 大模型(LLMs) 算法工程师相关的面试题

License:Apache-2.0Stargazers:1014Issues:0Issues:0

TensorNVMe

A Python library transfers PyTorch tensors between CPU and NVMe

Language:C++Stargazers:80Issues:0Issues:0

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonLicense:NOASSERTIONStargazers:7733Issues:0Issues:0

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonLicense:Apache-2.0Stargazers:7113Issues:0Issues:0

learn-nlp-with-transformers

we want to create a repo to illustrate usage of transformers in chinese

Language:ShellStargazers:1709Issues:0Issues:0

obsidian-better-export-pdf

Obsidian PDF export enhancement plugin

Language:TypeScriptLicense:MITStargazers:207Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:11215Issues:0Issues:0

PatrickStar

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.

Language:PythonLicense:BSD-3-ClauseStargazers:741Issues:0Issues:0

AISystem

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8712Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:126477Issues:0Issues:0

ColossalAI

Making large AI models cheaper, faster and more accessible

Language:PythonLicense:Apache-2.0Stargazers:38048Issues:0Issues:0

stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

Language:C++License:Apache-2.0Stargazers:1092Issues:0Issues:0

obsidian-quickshare

📝 An Obsidian plugin for sharing encrypted Markdown notes on the web. Zero configuration required.

Language:TypeScriptLicense:MITStargazers:237Issues:0Issues:0

libalgebra

Fast C header-only library for popcnt, pospopcnt, and set algebraic operations

Language:CLicense:Apache-2.0Stargazers:42Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:434Issues:0Issues:0