Dennis Huo's repositories
dataproc-initialization-actions
Run in all nodes of your cluster before the cluster starts - let's you customize your cluster
spark-dataflow
Provides a Spark backend for executing Dataflow pipelines.
airflow-gcp-examples
Repository with examples and smoke tests for the GCP Airflow operators and hooks
appengine-flask-skeleton
A skeleton for creating Python applications using the Flask framework on App Engine
arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
bigdata-interop
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
bigtop
Mirror of Apache Bigtop
cloud-bigtable-examples
Examples of how to use Cloud Bigtable both with GCE map/reduce as well as stand alone applications.
codelabs
Codelabs in various languages demonstrating usage of several tools & systems upon genomics data.
hadoop
Mirror of Apache Hadoop
hbase
Mirror of Apache HBase
hive
Mirror of Apache Hive
kaggle-dsb2
Kaggle 2nd annual data science bowl
luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
parquet-format
Apache Parquet
polaris
The interoperable, open source catalog for Apache Iceberg
spark
Mirror of Apache Spark
spark-csv
CSV data source for Spark SQL and DataFrames
zeppelin
Mirror of Apache Zeppelin
zlib
A massively spiffy yet delicately unobtrusive compression library.