looput

looput

Geek Repo

Company:Huazhong University of Science and Technology

Location:shenzhen

Github PK Tool:Github PK Tool

looput's starred repositories

quivr

Open-source RAG Framework for building GenAI Second Brains 🧠 Build productivity assistant (RAG) ⚡️🤖 Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Efficient retrieval augmented generation framework

Language:PythonLicense:NOASSERTIONStargazers:34449Issues:277Issues:1214

HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

Language:PythonLicense:Apache-2.0Stargazers:33204Issues:1142Issues:1405

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:19301Issues:297Issues:1341

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:15321Issues:103Issues:992

pdf2htmlEX

Convert PDF to HTML without losing text or format.

Language:HTMLLicense:NOASSERTIONStargazers:10315Issues:508Issues:686

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

CodeGeeX

CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

Language:PythonLicense:Apache-2.0Stargazers:8025Issues:85Issues:212

imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Language:PythonLicense:MITStargazers:7923Issues:113Issues:300

CodeGeeX2

CodeGeeX2: A More Powerful Multilingual Code Generation Model

Language:PythonLicense:Apache-2.0Stargazers:7609Issues:64Issues:245

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

nlpaug

Data augmentation for NLP

Language:Jupyter NotebookLicense:MITStargazers:4371Issues:41Issues:221

VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

Language:PythonLicense:Apache-2.0Stargazers:4049Issues:40Issues:349

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:3453Issues:24Issues:439

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

Language:PythonLicense:NOASSERTIONStargazers:2487Issues:40Issues:22

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Language:PythonLicense:Apache-2.0Stargazers:2044Issues:29Issues:134

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Language:PythonLicense:Apache-2.0Stargazers:1885Issues:17Issues:158

AutoChain

AutoChain: Build lightweight, extensible, and testable LLM Agents

Language:PythonLicense:MITStargazers:1764Issues:11Issues:10

Chrome-GPT

An AutoGPT agent that controls Chrome on your desktop

Language:PythonLicense:GPL-3.0Stargazers:1639Issues:22Issues:28

bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Language:ShellLicense:NOASSERTIONStargazers:968Issues:38Issues:19

Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:588Issues:8Issues:111

deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

Language:PythonLicense:Apache-2.0Stargazers:439Issues:6Issues:24

WebShop

[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Language:PythonLicense:MITStargazers:234Issues:12Issues:25

pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

Language:PythonLicense:MITStargazers:171Issues:6Issues:8

ERNIE-Layout-Pytorch

An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.

Language:PythonLicense:MITStargazers:96Issues:3Issues:20

SemDeDup

Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).

Language:PythonLicense:NOASSERTIONStargazers:90Issues:3Issues:8

finngen-tools

Tools for training causal language models for Finnish

Language:PythonLicense:MITStargazers:25Issues:13Issues:0