Zhenyu He (zhenyuhe00)

zhenyuhe00

Geek Repo

Company:Peking University

Location:Beijing, China

Home Page:zhenyuhe00.github.io

Github PK Tool:Github PK Tool

Zhenyu He's starred repositories

ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Language:PythonLicense:Apache-2.0Stargazers:293Issues:0Issues:0
Language:MATLABLicense:GPL-3.0Stargazers:9893Issues:0Issues:0

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

License:Apache-2.0Stargazers:3068Issues:0Issues:0

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

Stargazers:882Issues:0Issues:0

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

Language:PythonLicense:MITStargazers:600Issues:0Issues:0
Stargazers:9Issues:0Issues:0
Language:PythonStargazers:8Issues:0Issues:0

ml4se

A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering

Stargazers:651Issues:0Issues:0

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

License:MITStargazers:3027Issues:0Issues:0

infini-transformer-pytorch

Implementation of Infini-Transformer in Pytorch

Language:PythonLicense:MITStargazers:95Issues:0Issues:0

ring-attention-pytorch

Implementation of šŸ’ Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Language:PythonLicense:MITStargazers:413Issues:0Issues:0

RULER

This repo contains the source code for RULER: Whatā€™s the Real Context Size of Your Long-Context Language Models?

Language:PythonLicense:Apache-2.0Stargazers:356Issues:0Issues:0
Language:PythonStargazers:9Issues:0Issues:0
Language:PythonStargazers:145Issues:0Issues:0

mixture-of-depths

An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

Language:PythonLicense:NOASSERTIONStargazers:31Issues:0Issues:0

DVMP

The official implementation of dual-view molecule pre-training.

License:MITStargazers:3Issues:0Issues:0
License:MITStargazers:3Issues:0Issues:0

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:23337Issues:0Issues:0

Transformer-M

[ICLR 2023] One Transformer Can Understand Both 2D & 3D Molecular Data (official implementation)

Language:PythonLicense:MITStargazers:197Issues:0Issues:0

VAR

[GPT beats diffusionšŸ”„] [scaling laws in visual generationšŸ“ˆ] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3854Issues:0Issues:0
Language:PythonStargazers:36Issues:0Issues:0

EasyKV

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

Language:PythonStargazers:55Issues:0Issues:0

InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

Language:PythonLicense:MITStargazers:240Issues:0Issues:0

TOVA

Token Omission Via Attention

Language:PythonLicense:Apache-2.0Stargazers:113Issues:0Issues:0

H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Language:PythonStargazers:322Issues:0Issues:0

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonLicense:MITStargazers:628Issues:0Issues:0

EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Language:PythonLicense:Apache-2.0Stargazers:545Issues:0Issues:0

JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Language:PythonLicense:Apache-2.0Stargazers:947Issues:0Issues:0

BiPE

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024

Language:PythonLicense:MITStargazers:19Issues:0Issues:0

mergekit

Tools for merging pretrained large language models.

Language:PythonLicense:LGPL-3.0Stargazers:4118Issues:0Issues:0