Beast code in Giters

Takumi Ito's starred repositories

persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Language:Python64100

mesop

Build delightful web apps quickly in Python

Language:PythonApache-2.0481200

outlines

Structured Text Generation

Language:PythonApache-2.0737100

instructor

structured outputs for llms

Language:PythonMIT686700

distilabel

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

Language:PythonApache-2.0120000

artkit

Automated prompt-based testing and evaluation of Gen AI applications

Language:Jupyter NotebookApache-2.09600

LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

133600

LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

MIT229200

mindsdb

The platform for building AI from enterprise data

Language:PythonNOASSERTION2585100

Adala

Adala: Autonomous DAta (Labeling) Agent framework

Language:PythonApache-2.087400

llm-app

Dynamic RAG for enterprise. Ready to run with Docker,⚡in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

MIT339500

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & security. 🛡️ Features include text quality, relevance metrics, & sentiment analysis. 📊 A comprehensive tool for LLM observability. 👀

Language:Jupyter NotebookApache-2.078500

llamafile

Distribute and run LLMs with a single file.

Language:C++NOASSERTION1786200

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonApache-2.01322700

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT3397700

langfuse

🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

Language:TypeScriptNOASSERTION495100

ultrajson

Ultra fast JSON decoder and encoder written in C with Python bindings

Language:CNOASSERTION428700

mrab-regex

Language:CNOASSERTION41800

text-dedup

All-in-one text de-duplication

Language:PythonApache-2.055500

preprocess

Corpus preprocessing

Language:C++NOASSERTION9300

AlignScore

ACL2023 - AlignScore, a metric for factual consistency evaluation.

Language:PythonMIT9900

optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Language:PythonApache-2.0234800

J-UniMorph

Dataset of UniMorph in Japanese

Language:JavaScriptCC-BY-4.0400

DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤

Language:PythonMIT74800

Appraise

Appraise code used as part of WMT21 human evaluation campaign

Language:PythonBSD-3-Clause2200

TransformerLens

A library for mechanistic interpretability of GPT-style language models

Language:PythonMIT125800

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Language:PythonBSD-2-Clause1024400

uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

Language:PythonApache-2.0211700

reflex

🕸️ Web apps in pure Python 🐍

Language:PythonApache-2.01821100

reactpy

It's React, but in Python

Language:PythonMIT777900

taku-ito