Gathers data science and machine learning problem solving using PySpark and Hadoop.
- Test Pyspark
- Text classification IMDB dataset using logistic regression
- Text classification IMDB dataset using multinomial
- Topic Modelling TFIDF + LDA
- Word Vector
- Read Iris csv from Hadoop DFS
- PCA on Iris dataset
- MNIST feed-forward sparkflow
- MNIST CNN sparkflow
- MNIST RNN-LSTM sparkflow
- Fashion-MNIST Inception v1 sparkflow
- Run docker compose,
compose/build
Or you can choose cluster mode,
docker-compose -f docker-compose-cluster.yml up --build --remove-orphans
- Visit localhost:8089 for passwordless jupyter notebook.
Check Hadoop health, localhost:9870
Hadoop DFS Web UI, localhost:9870/explorer.html#/
Hadoop Node Manager, localhost:8042/node
If success using cluster mode,
slave_2 | 2018-11-18 07:57:59 INFO Worker:54 - Successfully registered with master spark://192.168.128.2:7077
slave_1 | 2018-11-18 07:58:10 INFO Worker:54 - Successfully registered with master spark://192.168.128.2:7077
Check Spark health, localhost:8080