Joao Pedro Afonso Cerqueira's repositories
Akamai-log-Analysis-SparkML-H2o
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
Jupyter_Spark_H2O_Kafka_Client_Setup
This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebooks in Jupyter in WSL (Windows Subsystem Linux), or Docker containers with Ubunto 18.4 LTS
project_lost_saturn
This is the core of project lost_saturn . The project lost_saturn project is a modern approach to datascience, focus on enabling DataScience on containerised environments everywhere. Built first with a local setup and transformed into a container solution. It has tools centralized in Jupyter , with Spark and AutoML H2O.ai . Ideal to run Notebooks in Jupyter in WSL (Windows Subsystem Linux), or Docker containers with Ubunto 18.4 LTS
technical-test-Jupyter-Spark-Delta-Pandas
Technical Test Github Repo for Container of Test
Terraform_start6Nodes_cdh5.xCluster
AWScli Terraform for 6 Node Cloudera CDH with Hadoop Spark Hive
airflow-executions
Apache Airflow for K8s Clusters with Docker-compose orchestration. Example includes used in Workflows for Jobs like WebHooks and WebScrapers
jpac-sparklyr
H2O and sparklyr setup in Rstudio with demo/trials for Hadoop Spark
spark-on-kubernetes
An Deployment and Setup of Apache Spark for multi-tenant usage in Kubernetes Clusters. This deploys 1 Executor per K8S POD , scales linearly.
SparkElasticSearchPublisher
Elasticsearch publisher using Hadoop as source and Spark 1.6 as ETL engine :: Running package for Cloudera CDH 5.9.0 Cluster
als-benchmark-scripts
Scripts to benchmark distributed Alternative Least Squares (ALS)
cluster-management-python-pyspark-ngrams-samples
cluster-management-python-pyspark-ngrams-samples
confluent-kafka-xperiments
Experimentation of confluent Kafka Tools and Client solutions
Docker-Container-Jupyter
Docker-Container for Jupyter Notebooks using as a baseline hook other repo
FiveCoolTest
Techical assignment
jpac-flume-logs
My adaptation of the flume-logs ingestion process
TensorFlowJava
TensorFlow in Java. If Google Can do it! I can Do it!