Renien John Joseph's repositories
docker-spark-livy
Spark Standalone & Livy
generate-weather-data
generate sample weather dummy data
trino-poc
trino testing environment
ansible-spark-livy
Ansible roles to install an Spark Standalone cluster and Livy in docker
bigdata-analytics-session
Practical session on ML and ML on Big Data
bigdata-diff
This repository contains util to compare large data set
bigdata-utils
BigData utils to make DataEngineer's life easy
cloud-bigtable-examples
Examples of how to use Cloud Bigtable both with GCE map/reduce as well as stand alone applications.
cloud-datasync
Big Data :elephant: & Cloud Data :cloud: Sync Tool : cloud data sync tool will be very useful during the migration process to move all the partitioned/non-partitioned HDFS data to cloud buckets :cloud:.
cp-helm-charts
The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
docker-spark-stand-alone
Spark 2.4.7 stand alone docker image
functional-programming-js-practical
:books: Functional Programming Languages : Lecture (JavaScript)
functional-programming-scala-practical1
:books: Functional Programming Languages : Lecture 1 (Scala)
functional-programming-scala-practical2
📚 Functional Programming Languages : Lecture 2 (Scala)
play-with-spark-starter-kit
Play with Spark: Building Apache Spark with Play Framework
scikit-learn
scikit-learn: machine learning in Python
security-module
security-module bigdata system
self-attentive-parser
Constituency Parsing with a Self-Attentive Encoder (ACL 2018)
spark-ml-document-classification
An example on using spark :sparkles: ML models to classify docuements
Statistical-Data-Exploration-Using-Spark-2.0
Data Exploration Using Spark 2.0
TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs onto Apache Spark clusters
trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
wasabi
Wasabi A/B Testing Service is a real-time, enterprise-grade, 100% API driven project.