fisher's starred repositories

GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Language:PythonStargazers:5142Issues:0Issues:0

bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Language:PythonLicense:BSD-3-ClauseStargazers:667Issues:0Issues:0

firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

Language:TypeScriptLicense:AGPL-3.0Stargazers:16824Issues:0Issues:0

lerobot

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Language:PythonLicense:Apache-2.0Stargazers:6762Issues:0Issues:0

MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

Language:PythonLicense:AGPL-3.0Stargazers:12700Issues:0Issues:0

LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:28892Issues:0Issues:0

omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

Language:PythonLicense:GPL-3.0Stargazers:5139Issues:0Issues:0

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonLicense:AGPL-3.0Stargazers:140883Issues:0Issues:0

graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

Language:PythonLicense:MITStargazers:17937Issues:0Issues:0

OpenRefine

OpenRefine is a free, open source power tool for working with messy data and improving it

Language:JavaLicense:BSD-3-ClauseStargazers:10833Issues:0Issues:0

omnivore

Omnivore is a complete, open source read-it-later solution for people who like reading.

Language:TypeScriptLicense:AGPL-3.0Stargazers:12720Issues:0Issues:0

inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Language:PythonLicense:Apache-2.0Stargazers:5030Issues:0Issues:0

mlflow

Open source platform for the machine learning lifecycle

Language:PythonLicense:Apache-2.0Stargazers:18466Issues:0Issues:0

dask

Parallel computing with task scheduling

Language:PythonLicense:BSD-3-ClauseStargazers:12473Issues:0Issues:0

amphi-etl

Python-based Low-code ETL for data manipulation and transformation. Generates Python code you can deploy anywhere.

Language:TypeScriptLicense:NOASSERTIONStargazers:808Issues:0Issues:0

langflow

Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.

Language:PythonLicense:MITStargazers:31003Issues:0Issues:0

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonLicense:Apache-2.0Stargazers:4805Issues:0Issues:0

vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

Language:PythonLicense:MITStargazers:11098Issues:0Issues:0

instructor

structured outputs for llms

Language:PythonLicense:MITStargazers:7723Issues:0Issues:0

NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:498Issues:0Issues:0

cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Language:PythonLicense:AGPL-3.0Stargazers:9581Issues:0Issues:0

doccano

Open source annotation tool for machine learning practitioners.

Language:PythonLicense:MITStargazers:9466Issues:0Issues:0

promptsource

Toolkit for creating, sharing and using natural language prompts.

Language:PythonLicense:Apache-2.0Stargazers:2660Issues:0Issues:0

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Language:PythonLicense:Apache-2.0Stargazers:2659Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:36616Issues:0Issues:0

timesfm

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.

Language:PythonLicense:Apache-2.0Stargazers:3658Issues:0Issues:0

terminalizer

🦄 Record your terminal and generate animated gif images or share a web player

Language:JavaScriptLicense:MITStargazers:15317Issues:0Issues:0

DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

Language:PythonLicense:MITStargazers:3459Issues:0Issues:0

openspg

OpenSPG is a Knowledge Graph Engine developed by Ant Group in collaboration with OpenKG, based on the SPG (Semantic-enhanced Programmable Graph) framework. Core Capabilities: 1) domain model constrained knowledge modeling, 2) facts and logic fused representation, 3) KAG will be natively supported soon, so please stay tuned...

Language:JavaLicense:Apache-2.0Stargazers:647Issues:0Issues:0

termux-app

Termux - a terminal emulator application for Android OS extendible by variety of packages.

Language:JavaLicense:NOASSERTIONStargazers:35360Issues:0Issues:0