jklj077

followers

following

stars

Ren Xuancheng's starred repositories

candle

Minimalist ML framework for Rust

Language:RustApache-2.01407600

open-webui

User-friendly WebUI for LLMs (Formerly Ollama WebUI)

Language:SvelteMIT2657900

hugo-PaperMod

A fast, clean, responsive Hugo theme.

Language:HTMLMIT896700

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonApache-2.090100

template

This is the repository for the distill web framework

Language:JavaScriptApache-2.076900

locust

Write scalable load tests in plain Python 🚗💨

Language:PythonMIT2396400

mediawiki-services-parsoid

This is a mirror from https://gerrit.wikimedia.org/g/mediawiki/services/parsoid/. See https://www.mediawiki.org/wiki/Developer_access for contributing.

Language:PHPGPL-2.014800

crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language:TypeScriptApache-2.01272100

paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

Language:JavaApache-2.0202700

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.040200

python-magic

A python wrapper for libmagic

Language:PythonNOASSERTION256200

pandoc

Universal markup converter

Language:HaskellNOASSERTION3292900

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonGPL-3.01216800

dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptNOASSERTION3330400

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonNOASSERTION222500

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据！

Language:PythonApache-2.0162700

MAP-NEO

Language:Python68000

internetarchive

A Python and Command-Line Interface to Archive.org

Language:PythonAGPL-3.0154600

ia-download

Internet archive downloader

Language:Jupyter Notebook200

llama-cpp-python

Python bindings for llama.cpp

Language:PythonMIT693700

dash-cookbook

Receipts for creating AI Applications with APIs from DashScope (and friends)!

Apache-2.01900

python-markdownify

Convert HTML to Markdown

Language:PythonMIT83100

the-stack-v2

Code for the curation of The Stack v2 and StarCoder2 training data

Language:Jupyter NotebookApache-2.07000

octopack

🐙 OctoPack: Instruction Tuning Code Large Language Models

Language:Jupyter NotebookMIT39400

unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonApache-2.01135600

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonApache-2.0164500

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2192600

CodeQwen1.5

CodeQwen1.5 is the code version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.

Language:Python33600

web-content-extraction-benchmark

Web Content Extraction Benchmark

Language:PythonApache-2.01300

yt-dlp

A feature-rich command-line audio/video downloader

Language:PythonUnlicense7422200