SandishKumarHN's repositories
awesome-consensus
Awesome list for Paxos and friends
datacollector
StreamSets Data Collector - Continuous big data and cloud platform ingest infrastructure
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
flink
Mirror of Apache Flink
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
incubator-druid
Apache Druid (Incubating) - Column oriented distributed data store ideal for powering interactive applications
kudu
Mirror of Apache Kudu
logging-log4j2
Mirror of Apache Logging Log4J2
nifi
Mirror of Apache NiFi
HTTP-Octopus
HTTP-Octopus
incubator-gearpump
Mirror of Apache Gearpump (Incubating)
incubator-pinot
Apache Pinot (Incubating) - A realtime distributed OLAP datastore
jvm-readings
JVM readings
LoveIt
❤️A clean, elegant but advanced blog theme for Hugo 一个简洁、优雅且高效的 Hugo 主题
oozie
Mirror of Apache Oozie
papers-we-love
Papers from the computer science community to read and discuss.
polynote
A better notebook for Scala (and more)
presto
The official home of the Presto distributed SQL query engine for big data
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
rl
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
spark
Apache Spark - A unified analytics engine for large-scale data processing
spring-hadoop
Spring for Apache Hadoop is a framework for application developers to take advantage of the features of both Hadoop and Spring.
sqoop
Mirror of Apache Sqoop
system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.