therobotacademy / ml-bigdata_training

Using Spark, Hadoop and Tensorflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BigData_training

1. Big Data Essentials: HDFS, MapReduce and Spark RDD by Yandex

Hadoop Yarn Notebook

Docker container with Hadoop Yarn Jupyter Notebook: yarn-notebook

To SSH into the container: $docker exec -it hadoop_8881 bash

Spark into Jupyter Notebook (PySpark)

Docker container with Spark Jupyter Notebook: spark-course1

docker run -it -p 8880:8888 --name spark_8880 bigdatateam/spark-course1

To SSH into the container: $ docker exec -it spark_8880 bash

2. Big Data Applications: Machine Learning at Scale by Yandex

Docker container Spark course 3: spark-course1

3. Introduction to Deep Learning por National Research University Higher School of Economics

https://github.com/hse-aml/intro-to-dl

_________________________________ at folder intro-to-dl

Docker container with Jupyter Environment: coursera-aml-docker with source repository https://github.com/ZEMUSHKA/coursera-aml-docker

Week 3:

  • 3.1 Introduction to CNN
  • 3.2 Modern CNNs

Your first CNN on CIFAR-10

  1. Follow the instructions on https://hub.docker.com/r/zimovnov/coursera-aml-docker/ to install Docker container with all necessary software installed. After that you should see a Jupyter page in your browser.
docker run -it -p 8882:8080 --name coursera-aml-1 zimovnov/coursera-aml-docker

Add -p 7007:7007 if wanting to access Tensorflow dashboard.

  1. Or buid it from the Dockerfile at folder coursera-aml-docker:
docker build -t brjapon/coursera-aml-docker .
docker run -it -p 8882:8080 --name coursera-aml-10 brjapon/coursera-aml-docker
  1. SSH into the container and clone the repo with the exercises:
docker exec -it coursera-aml-1 bash
git clone https://github.com/hse-aml/intro-to-dl
  1. Download Keras and week 3 resources, by executing the required lines in notebook ./intro-to-dl/download_resources.ipynb

  2. Run notebook Task 1

    ./intro-to-dl/week3/week3_task1_first_cnn_cifar10_clean.ipynb

  3. Run notebook Task 2

    ./intro-to-dl/week3/week3_task2_fine_tuning_clean.ipynb

Container checkpoints

You might want to make a checkpoint of your work so that you can return to it later. Think of it as a backup or commit in version control system.

Saving container state

You will first have to stop the container following instructions above. Now you need to save the container state so that you can return to it later

docker commit coursera-aml-1 coursera-aml-snap-1

You can make sure that it's saved by running docker images.

Creating new container from previous checkpoint If you want to continue working from a particular checkpoint, you should run a new container from your saved image by executing

docker run -it -p 8882:8080 -p 7007:7007 --name coursera-aml-2 coursera-aml-snap-1

Notice that we incremented index in the container name, because we created a new container.

Using GPU in your container

You can use NVIDIA GPU in your container on Linux host machine.

Setup docker following instructions from NVIDIA:

https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(version-2.0)#prerequisites In your container replace CPU TensorFlow version with the one that supports GPU:

pip3 uninstall tensorflow
pip3 install tensorflow-gpu==1.2.1

You will also have to install NVIDIA GPU driver, CUDA toolkit and CuDNN (requires registration with NVIDIA) in your container in order for TensorFlow to work with your GPU: https://www.tensorflow.org/versions/r1.2/install/install_linux#nvidia_requirements_to_run_tensorflow_with_gpu_support

It can be hard to follow, so you might choose to stick to a CPU version, which is also fine for the purpose of this course. TensorFlow provides Docker files with TensorFlow on GPU, but they don't have all the additional dependencies we need, this is for advanced users: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker

  • 3.3 Application of CNNs

4. Applied AI with DeepLearning by IBM

https://github.com/romeokienzler/developerWorks

_________________________________ at folder developerWorks

https://github.com/romeokienzler/CognitiveIoT

About

Using Spark, Hadoop and Tensorflow


Languages

Language:Jupyter Notebook 99.8%Language:R 0.1%Language:Python 0.0%Language:Dockerfile 0.0%Language:Shell 0.0%