jackfsuia's starred repositories

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

Stargazers:990Issues:0Issues:0

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:14530Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:2279Issues:0Issues:0

RLHF-Label-Tool

用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.

Language:PythonStargazers:239Issues:0Issues:0

persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Language:PythonStargazers:700Issues:0Issues:0

arxiv-sanity-preserver

Web interface for browsing, search and filtering recent arxiv submissions

Language:PythonLicense:MITStargazers:5099Issues:0Issues:0

Case_or_Rule

exploring whether LLMs perform case-based or rule-based reasoning

Language:PythonLicense:MITStargazers:20Issues:0Issues:0

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language:PythonLicense:Apache-2.0Stargazers:3667Issues:0Issues:0

codellm-data-preprocess-pipeline

代码大模型 预训练&微调&DPO 数据处理 业界处理pipeline sota

Language:PythonStargazers:25Issues:0Issues:0

mergekit

Tools for merging pretrained large language models.

Language:PythonLicense:LGPL-3.0Stargazers:4363Issues:0Issues:0

cachecloud

搜狐视频(sohu tv)Redis私有云平台 :支持Redis多种架构(Standalone、Sentinel、Cluster)高效管理、有效降低大规模redis运维成本,提升资源管控能力和利用率。平台提供快速搭建/迁移,运维管理,弹性伸缩,统计监控,客户端整合接入等功能。(CacheCloud is a Redis cloud management platform. It supports Standalone, Sentinel, and Cluster architectures for Redis, effectively reducing large-scale Redis operation and maintenance costs, and improving resource management and utilization. The platform provides rapid construction/migration, operation and maintenance management, elastic scaling, statistical monitoring, client integration and access and other functions)

Language:HTMLLicense:Apache-2.0Stargazers:8744Issues:0Issues:0

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Language:PythonLicense:Apache-2.0Stargazers:3340Issues:0Issues:0

continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains

Language:TypeScriptLicense:Apache-2.0Stargazers:14622Issues:0Issues:0

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonLicense:GPL-3.0Stargazers:15758Issues:0Issues:0

playwright-python

Python version of the Playwright testing and automation library.

Language:PythonLicense:Apache-2.0Stargazers:11383Issues:0Issues:0

wave-share

Serverless, peer-to-peer, local file sharing through sound

Language:C++License:MITStargazers:2159Issues:0Issues:0

label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Language:JavaScriptLicense:Apache-2.0Stargazers:17935Issues:0Issues:0

EasySpider

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

Language:JavaScriptLicense:NOASSERTIONStargazers:32994Issues:0Issues:0

MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫

Language:PythonLicense:NOASSERTIONStargazers:16004Issues:0Issues:0
Language:PythonStargazers:811Issues:0Issues:0

search_with_lepton

Building a quick conversation-based search demo with Lepton AI.

Language:TypeScriptLicense:Apache-2.0Stargazers:7670Issues:0Issues:0

nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Language:PythonLicense:MITStargazers:8639Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:36206Issues:0Issues:0

anserini

Anserini is a Lucene toolkit for reproducible information retrieval research

Language:JavaLicense:Apache-2.0Stargazers:1009Issues:0Issues:0

Perplexica

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

Language:TypeScriptLicense:MITStargazers:12458Issues:0Issues:0

donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Language:PythonLicense:MITStargazers:5637Issues:0Issues:0

AutoWebGLM

An LLM-based Web Navigating Agent (KDD'24)

Language:PythonLicense:Apache-2.0Stargazers:567Issues:0Issues:0

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Language:PythonLicense:MITStargazers:11602Issues:0Issues:0

text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:4356Issues:0Issues:0

document.ai

基于向量数据库与GPT3.5的通用本地知识库方案(A universal local knowledge base solution based on vector database and GPT3.5)

Language:PythonLicense:AGPL-3.0Stargazers:3607Issues:0Issues:0