There are 5 repositories under cloudera topic.
1000+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Docker, CI/CD, APIs, SQL, PostgreSQL, MySQL, Hive, Impala, Kafka, Hadoop, Jenkins, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, tmux..
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Kubernetes, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Code for the deployment of Hadoop clusters, written in Bourne or Bourne Again shell.
Guide to data platforms and tools
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
Terraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)
Perl Utility Library for my other repos
Modeling Lifecycle with ACME Occupancy Detection and Cloudera
FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
CDH compliant Apache Phoenix
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementaciĂłn de una estrategia corporativa de dato Ăşnico, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librerĂa, ValidaciĂłn de nulos, largos de textos, máximos/mĂnimos de nĂşmeros y fechas, valores Ăşnicos y valores por default. TambiĂ©n permite clasificar los campos en aplicabilidad de derechos ARCO para facilitar la implementaciĂłn de leyes de protecciĂłn de datos tipo GDPR, identificar los niveles de seguridad y si se está aplicando algĂşn tipo de encriptaciĂłn. Adicionalmente permite agregar reglas de validaciĂłn más complejas sobre la misma tabla.
Create Greenplum docker files
Big Data
Cloudera Flow Management Workshop with Apache NiFi
DocGenius AI - Generative AI Chatbot for your Documents - Powered by Cloudera Machine Learning (CML)
Geocoding and Reverse Geocoding at Scale
A small Command Line tool to create an Kudu table from an Avro schema or from SQL script
This is the final project I had to do to finish my Big Data Expert Program in U-TAD in September 2017. It uses the following technologies: Apache Spark v2.2.0, Python v2.7.3, Jupyter Notebook (PySpark), HDFS, Hive, Cloudera Impala, Cloudera HUE and Tableau.
A quick and dirty CDH cluster skeleton using Docker for Testing
MiNiFi Agent Configuration and Scripts for NVidia Jetson Nano device