Kostas Stathoulopoulos (kstathou)

kstathou

Geek Repo

Location:London, UK

Home Page:https://kstathou.github.io/

Twitter:@kstathou

Github PK Tool:Github PK Tool

Kostas Stathoulopoulos's starred repositories

system-design-primer

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Language:PythonLicense:NOASSERTIONStargazers:272966Issues:6547Issues:320

recommenders

Best Practices on Recommendation Systems

Language:PythonLicense:MITStargazers:16181Issues:261Issues:786

cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Language:PythonLicense:AGPL-3.0Stargazers:9605Issues:90Issues:365

txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Language:PythonLicense:Apache-2.0Stargazers:8977Issues:90Issues:767
Language:Jupyter NotebookLicense:MITStargazers:5998Issues:93Issues:20

evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:5263Issues:48Issues:393

doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Language:PythonLicense:Apache-2.0Stargazers:3734Issues:43Issues:364

weaviate

Weaviate is an open source vector search engine that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients.

Language:GoLicense:BSD-3-ClauseStargazers:3192Issues:65Issues:1636

featureform

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

Language:Jupyter NotebookLicense:MPL-2.0Stargazers:1809Issues:15Issues:149

labelbox

Labelbox is the fastest way to annotate data to build and ship computer vision applications.

Language:JavaScriptLicense:Apache-2.0Stargazers:1720Issues:77Issues:0

bert-extractive-summarizer

Easy to use extractive text summarization with BERT

Language:PythonLicense:MITStargazers:1393Issues:25Issues:111

responsible-ai-toolbox

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

Language:TypeScriptLicense:MITStargazers:1365Issues:31Issues:280

skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

Language:PythonLicense:MITStargazers:918Issues:25Issues:75

fal

⚡ Fastest way to serve open source ML models to millions

Language:PythonLicense:Apache-2.0Stargazers:536Issues:15Issues:24

LinkBERT

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

Language:PythonLicense:Apache-2.0Stargazers:416Issues:7Issues:0

ml-design-patterns

Software Architecture for ML engineers

Concept

Concept Modeling: Topic Modeling on Images and Text

Language:PythonLicense:MITStargazers:193Issues:5Issues:19

ReFinED

ReFinED is an efficient and accurate entity linking (EL) system.

Language:PythonLicense:NOASSERTIONStargazers:188Issues:18Issues:26

dbt-ml-preprocessing

A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.

Language:PythonLicense:MITStargazers:179Issues:7Issues:6

adatest

Find and fix bugs in natural language machine learning models using adaptive testing.

Language:Jupyter NotebookLicense:MITStargazers:168Issues:18Issues:10

NodePiece

Compositional and Parameter-Efficient Representations for Large Knowledge Graphs (ICLR'22)

Language:PythonLicense:MITStargazers:139Issues:7Issues:7

SciREX

Data/Code Repository for https://api.semanticscholar.org/CorpusID:218470122

Language:PythonLicense:Apache-2.0Stargazers:128Issues:14Issues:15

openalex-guts

The guts for computing data for OpenAlex. For more, see https://openalex.org/.

Language:PythonLicense:MITStargazers:122Issues:11Issues:0

snowflake-grafana-datasource

Snowflake Grafana datasource plugin enables the visual representation of Snowflake data within Grafana dashboards and manages alerts.

Language:GoLicense:Apache-2.0Stargazers:67Issues:6Issues:39

ner-re-with-transformers-odsc2022

Building NER and RE components using HuggingFace Transformers

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:49Issues:3Issues:2

cord-19-search

Vespa application making an index of the CORD-19 dataset.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:39Issues:2Issues:0

osdg-tool

OSDG is an open-source tool that maps and connects activities to the UN Sustainable Development Goals (SDGs) by identifying SDG-relevant content in any text. The tool is available online at www.osdg.ai. API access available for research purposes.

Language:PythonLicense:LGPL-3.0Stargazers:35Issues:1Issues:3

osdg-data

The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, validated by OSDG Community Platform (OSDG-CP) citizen scientists with respect to the Sustainable Development Goals (SDGs). The dataset is updated every quarter and published on Zenodo.