Beast code in Giters

mzthhy's starred repositories

DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Language:PythonMIT343600

DataTager

Fine-Tune LLM Synthetic-Data application and "From Data to AGI: Unlocking the Secrets of Large Language Model"

Language:PythonGPL-3.01200

LLM4Chem

Official code repo for the paper "LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset"

Language:PythonMIT6100

EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

Language:PythonMIT35700

Mol-Instructions

[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

Language:PythonMIT23400

Darwin

An open-source project dedicated to build foundational large language model for natural science, mainly in physics, chemistry and material science.

Language:Jupyter NotebookNOASSERTION18700

SciCrawler

Web-Scarping tool for downloading the content of the following publishers: Elsevier, RSC, Web of Science, Springer Nature , Wiley.

Language:PythonMIT1400

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonApache-2.0881800

BLOOM-LORA

Due to restriction of LLaMA, we try to reimplement BLOOM-LoRA (much less restricted BLOOM license here https://huggingface.co/spaces/bigscience/license) using Alpaca-LoRA and Alpaca_data_cleaned.json

Language:Jupyter NotebookApache-2.018400

bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Language:ShellNOASSERTION97500

LegalQA-bloomz-560m

Finetuning a small BLOOMZ model (bloomz-560m) on a small dataset and with limited resources.

Language:Jupyter Notebook1700

TencentPretrain

Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo

Language:PythonNOASSERTION101900

TianGong-AI-Unstructure

Language:PythonMIT4900

nltk_data

NLTK Data

Language:Python144400

KnowLM

An Open-sourced Knowledgable Large Language Model Framework.

Language:PythonMIT120900

gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"

Language:PythonNOASSERTION2230000

Scripts

200

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT2353600

reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

Language:TypeScriptApache-2.0649300

Awesome-RAG-Evaluation

The official repository for the paper: Evaluation of Retrieval-Augmented Generation: A Survey.

MIT7500

MarineGPT

The official implementation of MarineGPT

Language:PythonNOASSERTION2400

multi-llm-chat

An application allowing for interaction with different LLM models. With the option to provide PDF, web and CSV links for context.

Language:PythonApache-2.01500

HalluQA

Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"

Language:PythonApache-2.010700

Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.

68700

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language:Python1007600

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonMIT175700

LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

932100

eRAG

Codes and packages for the paper titled Evaluating Retrieval Quality in Retrieval-Augmented Generation.

Language:PythonMIT1200

test

Measuring Massive Multitask Language Understanding | ICLR 2021

Language:PythonMIT116000

llm-continual-learning-survey

Continual Learning of Large Language Models: A Comprehensive Survey

21500