Doug Balog's repositories
bytecask
Key/value database inspired by Bitcask
data-validator
A tool to validate data built around Apache Spark.
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
fantasy-football
Choosing a fantasy football team using spark, hive, python, and really just about anything.
getting-started
This repository is a getting started guide to Singer.
imap_tools
Work with email by IMAP
kafka-connect-cassandra
Kafka Connect Cassandra Connector. This project includes source/sink connectors for Cassandra to/from Kafka.
pennsylvania-vaccines
This is a centralized repository for the Pennsylvania Vaccine Updates bots.
petuum
SailingLab's Petuum project.
PowerGraph
PowerGraph: A framework for large-scale machine learning and graph computation.
rich
Rich is a Python library for rich text and beautiful formatting in the terminal.
scala-chart
Scala Chart Library
singer-python
Writes the Singer format from Python
spark-deep-learning
Deep Learning Pipelines for Apache Spark
sqlmesh
SQLMesh is a DataOps framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
tap-framework
a framework for rapidly prototyping new singer taps
tap-shopify
Singer.io tap for extracting Shopify data
wumpus
Wumpus is an information retrieval system developed at the University of Waterloo. Its main purpose is to study issues that arise in the context of indexing dynamic text collections in multi-user environments.