ghbacct's starred repositories

maestro

Maestro: Netflix’s Workflow Orchestrator

Language:JavaLicense:Apache-2.0Stargazers:2671Issues:0Issues:0

datachain

DataChain 🔗 AI-dataframe to enrich, transform and analyze data from cloud storages for ML training and LLM apps

Language:PythonLicense:Apache-2.0Stargazers:510Issues:0Issues:0

pdfly

CLI tool to extract (meta)data from PDF and manipulate PDF files

Language:PythonLicense:BSD-3-ClauseStargazers:84Issues:0Issues:0

BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Language:PythonLicense:MITStargazers:5873Issues:0Issues:0

Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.

Language:PythonLicense:BSD-3-ClauseStargazers:2905Issues:0Issues:0

texthero

Text preprocessing, representation and visualization from zero to hero.

Language:PythonLicense:MITStargazers:2880Issues:0Issues:0
Language:TypeScriptStargazers:256Issues:0Issues:0

prodigy-tui

A textual TUI for Prodigy

Language:CSSLicense:NOASSERTIONStargazers:14Issues:0Issues:0

machine-learning-for-software-engineers

A complete daily plan for studying to become a machine learning engineer.

License:CC-BY-SA-4.0Stargazers:28003Issues:0Issues:0

100-Days-Of-ML-Code

100 Days of ML Coding

License:MITStargazers:44095Issues:0Issues:0

data-science-interviews

Data science interview questions and answers

Language:HTMLLicense:CC-BY-4.0Stargazers:8611Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:2226Issues:0Issues:0

dockerLLM

TheBloke's Dockerfiles

Language:ShellLicense:MITStargazers:292Issues:0Issues:0

aici

AICI: Prompts as (Wasm) Programs

Language:RustLicense:MITStargazers:1876Issues:0Issues:0

Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust

Language:RustLicense:Apache-2.0Stargazers:1946Issues:0Issues:0

drawdata

Draw datasets from within Jupyter.

Language:JavaScriptLicense:MITStargazers:747Issues:0Issues:0

feature-engineering-az

Source for book "Feature Engineering A-Z"

Language:HTMLStargazers:74Issues:0Issues:0

hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Language:Jupyter NotebookLicense:BSD-3-Clause-ClearStargazers:1638Issues:0Issues:0

llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:35110Issues:0Issues:0

mergekit

Tools for merging pretrained large language models.

Language:PythonLicense:LGPL-3.0Stargazers:4231Issues:0Issues:0

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Language:PythonLicense:Apache-2.0Stargazers:3843Issues:0Issues:0

Flowise

Drag & drop UI to build your customized LLM flow

Language:TypeScriptLicense:Apache-2.0Stargazers:28325Issues:0Issues:0

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonLicense:MITStargazers:1662Issues:0Issues:0

datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language:PythonLicense:Apache-2.0Stargazers:1830Issues:0Issues:0

instructor

structured outputs for llms

Language:PythonLicense:MITStargazers:6947Issues:0Issues:0

shaderunner

Ctrl + F but fancy.

Language:TypeScriptLicense:MITStargazers:10Issues:0Issues:0
Language:PythonLicense:MITStargazers:7Issues:0Issues:0

ML-Papers-Explained

Explanation to key concepts in ML

Stargazers:6969Issues:0Issues:0

projects

🪐 End-to-end NLP workflows from prototype to production

Language:PythonLicense:MITStargazers:1280Issues:0Issues:0

nlpaug

Data augmentation for NLP

Language:Jupyter NotebookLicense:MITStargazers:4372Issues:0Issues:0