Manoj Mallela's repositories
aas
Code to accompany Advanced Analytics with Spark from O'Reilly Media
amazon-kinesis-scaling-utils
The Kinesis Scaling Utility is designed to give you the ability to scale Amazon Kinesis Streams in the same way that you scale EC2 Auto Scaling groups – up or down by a count or as a percentage of the total fleet. You can also simply scale to an exact number of Shards. There is no requirement for you to manage the allocation of the keyspace to Shards when using this API, as it is done automatically.
awesome-data-engineering
A curated list of data engineering tools for software developers
awesome-public-datasets
An awesome list of high-quality open datasets in public domains (on-going).
awesome-spark
A curated list of awesome Apache Spark packages and resources.
basic-spark
Use Apache Spark like a Swiss Knife.
nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
datafusion-comet
Apache DataFusion Comet Spark Accelerator
db-migration
Databricks Migration Tools
director-scripts
Cloudera Director sample code
docker-elk
The ELK stack powered by Docker and Compose.
fluentd-benchmark
Benchmark collection of fluentd use cases
gimme-aws-whitepapers
Download AWS White-papers with minimum effort.
kubernetes-iperf3
Simple wrapper around iperf3 to measure network bandwidth from all nodes of a Kubernetes cluster
LastFM-LogAnalyzer
An Apache Spark application to analyze LastFM's userActivity logs
matplotlib-cheatsheet
Matplotlib 3.1 cheat sheet
scala-style-guide
Databricks Scala Coding Style Guide
tensorframes
Tensorflow wrapper for DataFrames on Apache Spark
terraform-aws-eks
A Terraform module to create an Elastic Kubernetes (EKS) cluster and associated worker instances on AWS.