jpacerqueira-zz / SparkElasticSearchPublisher

Elasticsearch publisher using Hadoop as source and Spark 1.6 as ETL engine :: Running package for Cloudera CDH 5.9.0 Cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SparkElasticSearchPublisher

This is a package based pom.xml with :

  • Spak 1.6 Jobs to consume personalized Social Media data

  • Discover personal data profile

  • Transform into an Elastic Search Index with Daily Activity

  • Runs on previous day files with Spark Package : Publication of dat with org.elasticsearch.spark.sql

TODO :

  • Split Jobs into /raw /stage /pubished :: data jobs - pending
  • New App required to publish in elastic seach only totals from published - Done
  • New Elastic Seach APPs must only run in elasticsearch.sql context - Done

About

Elasticsearch publisher using Hadoop as source and Spark 1.6 as ETL engine :: Running package for Cloudera CDH 5.9.0 Cluster

License:Other


Languages

Language:Scala 45.5%Language:Shell 28.9%Language:Python 21.1%Language:Jupyter Notebook 4.6%