Sanjoy Bose's repositories
rtb-papers
A collection of research and survey papers of real-time bidding (RTB) based display advertising techniques.
vaex
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
cortex
Deploy machine learning models to production
mediapipe
MediaPipe is the simplest way for researchers and developers to build world-class ML solutions and applications for mobile, edge, cloud and the web.
awesome-workflow-engines
A curated list of awesome open source workflow engines
flintrock
A command-line tool for launching Apache Spark clusters.
sagemaker-spark
A Spark library for Amazon SageMaker.
indicnlp_catalog
A collaborative catalog of resources for Indian language NLP
faas
OpenFaaS - Serverless Functions Made Simple
tink
Tink is a multi-language, cross-platform, open source library that provides cryptographic APIs that are secure, easy to use correctly, and hard(er) to misuse.
amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
detect-secrets
An enterprise friendly way of detecting and preventing secrets in code.
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
forecasting
Time Series Forecasting Best Practices & Examples
fairlearn
A Python package to assess and improve fairness of machine learning models.
MLOps_VideoAnomalyDetection
Operationalize a video anomaly detection model with Azure ML
annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
dagster
A Python library for building data applications: ETL, ML, Data Pipelines, and more.
argo
Argo Workflows: Get stuff done with Kubernetes.
open-data-registry
A registry of publicly available datasets on AWS
ludwig
Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
marquez
Collect, aggregate, and visualize a data ecosystem's metadata
snowplow
Cloud-native web, mobile and event analytics, running on AWS and GCP
awesome-public-datasets
A topic-centric list of HQ open datasets.
sope
Apache Spark ETL Utilities
forwardsecrecy
The project aims to simplify the usage of ECC curve (curve25519) with Diffie-Hellman Key exchange. The work is inline with the Account Aggregator Specification.
ml-readings
A list of papers / videos / tutorials / blog posts on machine learning
cyberprobe
Capturing, analysing and responding to cyber attacks
machine-learning-systems-design
A booklet on machine learning systems design with exercises