Jerome Banks's repositories
brickhouse
Hive UDF's for the data warehouse
experimental_bigdata-interop
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
satisfaction
The Next Generation Hadoop Scheduler
artemis-corpus-test-framework
A test framework for working with test corpora for unit tests.
aws-glue-data-catalog-client-for-apache-hive-metastore
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
boilerpipe
Work in progress transmit from Google Code
Chat-with-Github-Repo
This repository contains two Python scripts that demonstrate how to create a chatbot using Streamlit, OpenAI GPT-3.5-turbo, and Activeloop's Deep Lake.
docker-spark-k8s-aws
Docker image for running Spark 3 on Kubernetes on AWS
document-api-python
Create and modify Tableau workbook and datasource files
experimental_spark-bigquery
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
experimental_spark-bigquery-1
Google BigQuery support for Spark, SQL, and DataFrames
generalized-kmeans-clustering
This project generalizes the Spark MLLIB Batch and Streaming K-Means clusterers in every practical way.
incubator-hivemall
Mirror of Apache Hivemall (incubating)
influxdb-java
Java client for InfluxDB
js-murmur3-128
A JavaScript implementation of the 128bit variant of Murmur3 (that is compatible with Guava)
reactive-kafka
Reactive Streams API for Apache Kafka
redshift-auto-schema
Redshift Auto Schema is a Python library that takes a delimited flat file or parquet file as input, parses it, and provides a variety of functions that allow for the creation and validation of tables within Amazon Redshift.
sbt-google-cloud-storage
A SBT resolver and publisher for Google Cloud Storage
spark-glue
Spark releases with AWS Glue support
spark-on-k8s-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
spark-on-kubernetes-docker
Spark on Kubernetes infrastructure Docker images repo
spark-on-kubernetes-helm
Spark on Kubernetes infrastructure Helm charts repo