Aniket Mokashi's repositories
Big-Data-Benchmark-for-Big-Bench
Big Bench Workload Development
bigdata-interop
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
datafu
Mirror of Apache DataFu
datahub
A Generalized Metadata Search & Discovery Tool
dataproc-initialization-actions
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
flink
Apache Flink
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
iceberg
Apache Iceberg
incubator-druid
Apache Druid (Incubating) - Column oriented distributed data store ideal for powering interactive applications
inverting-proxy
Reverse proxy that inverts the direction of traffic
OpenLineage
An Open Standard for lineage metadata collection
parquet-mr
Apache Parquet
presto
Distributed SQL query engine for big data
professional-services
Common solutions and tools developed by Google Cloud's Professional Services team
qUtils
Utility codes useful for random tasks
spark-1
Apache Spark - A unified analytics engine for large-scale data processing
spark-bigquery-connector
The connector uses the Spark SQL Data Source API to read data from Google BigQuery.
spark-on-k8s-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
spark-perf
Performance tests for Apache Spark
velox
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.