Dip_an 's repositories
accelerate
๐ A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
halutmatmul_for_windows
Stella Nera is the first Maddness accelerator achieving 15x higher area efficiency (GMAC/s/mm^2) and 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators in the same technology
Open-Llama
The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
flash-attention
Fast and memory-efficient exact attention
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
grok-1
Grok open release
lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
linear_open_lm
A repository for research on medium sized language models.
llama3
The official Meta Llama 3 GitHub site
llamafile
Distribute and run LLMs with a single file.
LLM-Agents-Papers
A repo lists papers related to LLM based agent
llm-foundry
LLM training code for Databricks foundation models
LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
MetaGPT
๐ The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
minLlama3
a quick & complete guide to Llama 3's architecture
nanoGPT-TK
The simplest, fastest repository for training/finetuning medium-sized GPTs. Now, with kittens!
ollama
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
othello_mamba
Evaluating the Mamba architecture on the Othello game
pykan
Kolmogorov Arnold Networks
pythia
The hub for EleutherAI's work on interpretability and learning dynamics
ThunderKittens
Tile primitives for speedy kernels
tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
torchscale
Foundation Architecture for (M)LLMs
unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
X_net
a new transformer architecture