looput

looput

Geek Repo

Company:Huazhong University of Science and Technology

Location:shenzhen

Github PK Tool:Github PK Tool

looput's starred repositories

micrograd

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Language:Jupyter NotebookLicense:MITStargazers:9703Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:1058Issues:0Issues:0

BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Language:PythonLicense:Apache-2.0Stargazers:2780Issues:0Issues:0

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language:PythonLicense:Apache-2.0Stargazers:3445Issues:0Issues:0

reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

Language:TypeScriptLicense:Apache-2.0Stargazers:5944Issues:0Issues:0

edna

Note taking for developers and power users

Language:JavaScriptLicense:NOASSERTIONStargazers:354Issues:0Issues:0

LiveBench

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Language:PythonLicense:NOASSERTIONStargazers:159Issues:0Issues:0

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonLicense:MITStargazers:6310Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:8159Issues:0Issues:0

llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

Language:PythonLicense:Apache-2.0Stargazers:815Issues:0Issues:0

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonLicense:Apache-2.0Stargazers:28326Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:22443Issues:0Issues:0

dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

Language:PythonLicense:NOASSERTIONStargazers:2487Issues:0Issues:0

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

Stargazers:5904Issues:0Issues:0

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:3453Issues:0Issues:0

Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

Stargazers:885Issues:0Issues:0

deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

Language:PythonLicense:Apache-2.0Stargazers:439Issues:0Issues:0

nlpaug

Data augmentation for NLP

Language:Jupyter NotebookLicense:MITStargazers:4371Issues:0Issues:0

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Language:PythonLicense:Apache-2.0Stargazers:1891Issues:0Issues:0

Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:588Issues:0Issues:0

SemDeDup

Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically similar, but not exactly identical).

Language:PythonLicense:NOASSERTIONStargazers:90Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:37Issues:0Issues:0

finngen-tools

Tools for training causal language models for Finnish

Language:PythonLicense:MITStargazers:25Issues:0Issues:0

pdf2htmlEX

Convert PDF to HTML without losing text or format.

Language:HTMLLicense:NOASSERTIONStargazers:10315Issues:0Issues:0

pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

Language:PythonLicense:MITStargazers:171Issues:0Issues:0

HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

Language:PythonLicense:Apache-2.0Stargazers:33205Issues:0Issues:0

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

License:MITStargazers:3272Issues:0Issues:0

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language:PythonStargazers:9727Issues:0Issues:0

AutoChain

AutoChain: Build lightweight, extensible, and testable LLM Agents

Language:PythonLicense:MITStargazers:1764Issues:0Issues:0

WebShop

[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Language:PythonLicense:MITStargazers:234Issues:0Issues:0