Beast code in Giters

Yinpei Su's starred repositories

babilong

BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.

Language:Jupyter NotebookApache-2.011500

awesome-generative-ai-guide

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.

Language:PythonApache-2.090300

unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Language:HTMLApache-2.0766700

summary-of-a-haystack

Codebase accompanying the Summary of a Haystack paper.

Language:Jupyter NotebookApache-2.06100

loft

LOFT: A 1 Million+ Token Long-Context Benchmark

Apache-2.010400

LongICLBench

Code and Data for "Long-context LLMs Struggle with Long In-context Learning"

Language:PythonMIT7900

SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Language:PythonApache-2.090100

persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Language:Python58200

RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Language:PythonApache-2.035400

LEval

[ACL'24] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark

Language:PythonGPL-3.031200

Loong

[arxiv:2406.17419]Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

Language:PythonApache-2.03300

regular-investing-in-box

定投改变命运 —— 让时间陪你慢慢变富 https://onregularinvesting.com

Language:Python556900

CLongEval

CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models

Language:Jupyter NotebookMIT3600

simple-evals

Language:PythonMIT136900

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonApache-2.0375500

LongBench

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Language:PythonMIT54400

LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Language:Jupyter NotebookNOASSERTION132000

LongAlign

LongAlign: A Recipe for Long Context Alignment Encompassing Data, Training, and Evaluation

Language:PythonApache-2.015400

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1069200

leptonai

A Pythonic framework to simplify AI service building

Language:PythonApache-2.0257500

LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

904000

llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Language:Jupyter NotebookApache-2.03425500

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

CC0-1.01617400

Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Language:PythonApache-2.072100

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Language:PythonApache-2.0201900

AgentTuning

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Language:Python128200

awesome-instruction-datasets

A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。

Apache-2.044200

InsTag

InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning

14700

Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

1336700

syp1997