Beast code in Giters

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

101100

langchain

🦜🔗 Build context-aware reasoning applications

Language:Jupyter NotebookMIT9088600

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonApache-2.02918800

gradient-checkpointing

Make huge neural nets fit in memory

Language:PythonMIT267700

faiss

A library for efficient similarity search and clustering of dense vectors.

Language:C++MIT3005600

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter Notebook1135700

GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

Language:PythonMIT743200

promptbase

All things prompt engineering

Language:PythonMIT531200

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Language:PythonMIT1152300

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02500600

CTranslate2

Fast inference engine for Transformer models

Language:C++MIT315400

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonApache-2.0751100

google-research

Google Research

Language:Jupyter NotebookApache-2.03364400

BigTranslate

BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

Language:Python21200

self-instruct

Aligning pretrained language models with instruction data generated by themselves.

Language:PythonApache-2.0401800

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonApache-2.02927400

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Language:PythonMIT16588000

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Language:PythonBSD-3-Clause2524700

llama

Inference code for Llama models

Language:PythonNOASSERTION5512700

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.03437000

Opus-MT

Open neural machine translation models and web services

Language:PythonMIT58100

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language:Python983800

LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Language:PythonGPL-3.0566600