Follow Wiki to Setup Docker-based Environment

End-to-End, Real-time ML Reference Data Pipeline

Architecture Overview

Follow Wiki to Setup Docker-based Environment

Mapped to Code

Powered by the PANCAKE STACK!

Upcoming Workshops

Title

Building an End-to-End Streaming Analytics and Recommendations Pipeline with Spark, Kafka, and TensorFlow

Agenda (Full Day)

Part 1 (Analytics and Visualizations)

Analytics and Visualizations Overview (Live Demo!)
Verify Environment Setup (Docker, Cloud Instance)
Notebooks (Zeppelin, Jupyter/iPython)
Interactive Data Analytics (Spark SQL, Hive, Presto)
Graph Analytics (Spark, Elastic, NetworkX, TitanDB)
Time-series Analytics (Spark, Cassandra)
Visualizations (Kibana, Matplotlib, D3)
Approximate Queries (Spark SQL, Redis, Algebird)
Workflow Management (Airflow)

Part 2 (Streaming and Recommendations)

Streaming and Recommendations (Live Demo!)
Streaming (NiFi, Kafka, Spark Streaming, Flink)
Cluster-based Recommendation (Spark ML, Scikit-Learn)
Graph-based Recommendation (Spark ML, Spark Graph)
Collaborative-based Recommendation (Spark ML)
NLP-based Recommendation (CoreNLP, NLTK)
Geo-based Recommendation (ElasticSearch)
Hybrid On-Premise+Cloud Auto-scale Deploy (Docker)
Save Workshop Environment for Your Use Cases

Locations and Dates

Washington DC: Saturday, June 18th
Seattle: Saturday, July 30th
Santa Clara: Saturday, August 6th
Chicago: Saturday, September 10th
Toronto: Saturday, September 17th
New York: Saturday, September 24th
Barcelona: Saturday, October 1st
Munich: Saturday, October 15th
London: Saturday, October 22nd
Brussels: Saturday, October 29th
Oslo: Monday, October 31st
Tokyo: December 3rd
Shanghai: December 10th
Beijing: Saturday, December 17th
Hyderabad: Saturday, December 24th
Bangalore: Saturday, December 31st
Sydney: Saturday, January 7th, 2017
Melbourne: Saturday, January 14th, 2017
Sao Paulo: Saturday, February 11th, 2017
Rio de Janeiro: Saturday, February 18th, 2017

Suggest a City and Date

Description

The goal of this workshop is to build an end-to-end, streaming data analytics and recommendations pipeline on your local machine using Docker and the latest streaming analytics

First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.
Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.
Last, we productionize our pipeline and serve live recommendations to our users!

Screenshots

Apache Zeppelin Notebooks

Stanford CoreNLP Sentiment Analysis

Jupyter/iPython Notebooks

SparkR Notebooks

TensorFlow Notebooks

Deploy Spark ML and TensorFlow Models into Production with Netflix OSS

Apache NiFi Data Flows

AirFlow Workflows

Presto Queries

Tableau Integration

Beeline Command-line Hive Client

Log Visualization with Kibana & Logstash

Spark, Spark Streaming, and Spark SQL Admin UIs

Vector Host and Guest (Docker) System Metric UIs

Ganglia System and JVM Metrics Monitoring UIs

Tools Overview

About

Real-time, End-to-End, Advanced Analytics and Machine Learning Recommendation Pipeline

http://pipeline.io

Apache License 2.0

Languages

Language:Jupyter Notebook 94.8%Language:JavaScript 1.4%Language:CSS 1.1%Language:Scala 0.9%Language:Python 0.7%Language:Shell 0.4%Language:C++ 0.4%Language:Java 0.1%Language:HTML 0.1%Language:Vim Script 0.1%Language:ApacheConf 0.0%Language:C 0.0%Language:XSLT 0.0%Language:Makefile 0.0%