Maozhou Ge (GHGmc2)

GHGmc2

Geek Repo

Company:@intel

Location:Shanghai

Github PK Tool:Github PK Tool

Maozhou Ge's starred repositories

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:23441Issues:193Issues:197

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:22191Issues:217Issues:124

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookLicense:MITStargazers:11343Issues:79Issues:13

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonLicense:Apache-2.0Stargazers:7410Issues:97Issues:1480

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilogStargazers:6728Issues:65Issues:22

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonLicense:MITStargazers:6380Issues:60Issues:78

optax

Optax is a gradient processing and optimization library for JAX.

Language:PythonLicense:Apache-2.0Stargazers:1574Issues:37Issues:226

fastmoe

A fast MoE impl for PyTorch

Language:PythonLicense:Apache-2.0Stargazers:1483Issues:13Issues:113

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1400Issues:25Issues:20

maxtext

A simple, performant and scalable Jax LLM!

Language:PythonLicense:Apache-2.0Stargazers:1384Issues:26Issues:69

torchtitan

A native PyTorch Library for large model training

Language:PythonLicense:BSD-3-ClauseStargazers:1363Issues:36Issues:107

Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。

Language:PythonLicense:NOASSERTIONStargazers:1178Issues:22Issues:63

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonLicense:Apache-2.0Stargazers:985Issues:40Issues:65

EAGLE

Official Implementation of EAGLE-1 and EAGLE-2

Language:PythonLicense:Apache-2.0Stargazers:670Issues:12Issues:90

LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Language:PythonLicense:MITStargazers:560Issues:10Issues:33

megalodon

Reference implementation of Megalodon 7B model

Language:CudaLicense:MITStargazers:499Issues:14Issues:7

ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

Language:C++License:Apache-2.0Stargazers:441Issues:10Issues:10
Language:PythonLicense:Apache-2.0Stargazers:244Issues:8Issues:69

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Language:PythonLicense:Apache-2.0Stargazers:176Issues:12Issues:15

LightSeq

Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

nccl-rdma-sharp-plugins

RDMA and SHARP plugins for nccl library

Language:CLicense:BSD-3-ClauseStargazers:149Issues:22Issues:18

modern-latex

A short guide to LaTeX that avoids legacy cruft.

Language:TeXLicense:NOASSERTIONStargazers:115Issues:5Issues:2

ml-systems-papers

Curated collection of papers in machine learning systems

grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Language:CudaLicense:Apache-2.0Stargazers:35Issues:0Issues:0
Language:C++License:NOASSERTIONStargazers:9Issues:5Issues:0
Language:PythonLicense:MITStargazers:8Issues:0Issues:0