daniel servén's starred repositories

the-algorithm

Source code for Twitter's Recommendation Algorithm

Language:ScalaLicense:AGPL-3.0Stargazers:61832Issues:374Issues:966

vscodium

binary releases of VS Code without MS branding/telemetry/licensing

Language:ShellLicense:MITStargazers:24493Issues:212Issues:1238

qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Language:RustLicense:Apache-2.0Stargazers:19325Issues:119Issues:1170

datasette

An open source multi-tool for exploring and publishing data

Language:PythonLicense:Apache-2.0Stargazers:9151Issues:100Issues:1772

dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8534Issues:96Issues:378

super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4477Issues:44Issues:651

deepchecks

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

Language:PythonLicense:NOASSERTIONStargazers:3517Issues:19Issues:974

segment-geospatial

A Python package for segmenting geospatial data with the Segment Anything Model (SAM)

Language:PythonLicense:MITStargazers:2797Issues:54Issues:127

whylogs

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2606Issues:32Issues:426

stable-diffusion-tensorflow

Stable Diffusion in TensorFlow / Keras

Language:PythonLicense:NOASSERTIONStargazers:1574Issues:25Issues:49

refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

Language:PythonLicense:Apache-2.0Stargazers:1382Issues:16Issues:204

AI

Microsoft AI

Language:PythonLicense:MITStargazers:1370Issues:87Issues:33

eo-learn

Earth observation processing framework for machine learning in Python

Language:PythonLicense:MITStargazers:1103Issues:45Issues:159

graphein

Protein Graph Library

Language:Jupyter NotebookLicense:MITStargazers:1007Issues:19Issues:150

deepscatter

Zoomable, animated scatterplots in the browser that scales over a billion points

Language:TypeScriptLicense:NOASSERTIONStargazers:1005Issues:15Issues:59

poetry-dynamic-versioning

Plugin for Poetry to enable dynamic versioning based on VCS tags

Language:PythonLicense:MITStargazers:596Issues:5Issues:153

datamol

Molecular Processing Made Easy.

Language:PythonLicense:Apache-2.0Stargazers:444Issues:17Issues:106

SpanMarkerNER

SpanMarker for Named Entity Recognition

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:376Issues:9Issues:42

tner

Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition, EACL 2021"

Language:PythonLicense:MITStargazers:368Issues:9Issues:43

Uni-Fold

An open-source platform for developing protein models beyond AlphaFold.

Language:PythonLicense:Apache-2.0Stargazers:362Issues:7Issues:70

gnome-shell-extension-alt-tab-scroll-workaround

Quick fix to the bug where scrolling in one application is repeated in another when switching between them using Alt+Tab (e.g., VS Code and Chrome)

Language:JavaScriptLicense:GPL-3.0Stargazers:219Issues:4Issues:26

social-media-tutorials

Code dumps of Youtube/Twitter tutorials

Language:Jupyter NotebookStargazers:160Issues:8Issues:3

flash-genomics-model

My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other hierarchical methods)

Language:PythonLicense:MITStargazers:52Issues:6Issues:3

sequence-learn

With sequence-learn, you can build models for named entity recognition as quickly as if you were building a sklearn classifier.

Language:PythonLicense:Apache-2.0Stargazers:22Issues:4Issues:1

E3C-Corpus

E3C is a freely available multilingual corpus (Italian, English, French, Spanish, and Basque) of semantically annotated clinical narratives to allow for the linguistic analysis, benchmarking, and training of information extraction systems. It consists of two types of annotations: (i) clinical entities: pathologies, symptoms, procedures, body parts, etc., according to standard clinical taxonomies (i.e. SNOMED-CT, ICD-10); and (ii) temporal information and factuality: events, time expressions, and temporal relations according to the THYME standard. The corpus is organised into three layers, with different purposes. Layer 1: about 25K tokens per language with full manual annotation of clinical entities, temporal information and factuality, for benchmarkingand linguistic analysis. Layer 2: 50-100K tokens per language with semi-automatic annotations of clinical entities, to be used to train baseline systems. Layer 3: about 1M tokens per language of non-annotated medical documents to be exploited by semi-supervised approaches. Researchers can use the benchmark training and test splits of our corpus to develop and test their own models. We trained several deep learning based models and provide baselines using the benchmark. Both the corpus and the built models will be available through the ELG platform.

Stargazers:20Issues:0Issues:0

bulk-labeling

A tool for quickly adding labels to unlabeled datasets

Language:PythonLicense:Apache-2.0Stargazers:18Issues:7Issues:6

pycaprio

Python client to the INCEpTION annotation tool

Language:PythonLicense:MITStargazers:10Issues:3Issues:11

mlconfound

Tools for analyzing and quantifying effects of confounder variables on machine learning model predictions.

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:6Issues:1Issues:1

meowlflow

Easily deploy and serve MLflow models with expressive APIs powered by FastAPI

Language:PythonLicense:Apache-2.0Stargazers:3Issues:0Issues:0