Vagrant-Spark2
This repo is the fork of vagrant-jupyter which is based on Spark 3.1.3 and Jupyter.
This is a Vagrant machine with Jupyter Notebook (v5) installed. The kernels installed are: Python 3.8.x, Apache Spark 3.1.3 with Scala 2.12.x and PySpark (3.1.3).
The notebook folder is: /home/vagrant.
Requirements
You have to install:
- VirtualBox 5 or later (this Vagrantfile was tested on VirtualBox 7.0.0)
- VirtualBox Oracle VM VirtualBox Extension Pack
- Vagrant
- Vagrant plugins vagrant-vbguest to keep updated the guest additions on VM:
vagrant plugin install vagrant-vbguest
Install Hostmanager plugin
vagrant plugin install vagrant-hostmanager
Install Disksize plugin
vagrant plugin install vagrant-disksize
If you'are going to use Vagrant on Windows machine, you could get this error when start a new VM:
rsync could not be found on your PATH. Make sure that rsync is properly installed on your system and available on the PATH.
In this case you have to install cwRsync and add in your %PATH% the folder where you'll install it:
%SystemRoot%\system32;%SystemRoot%;%PROGRAMFILES%\cwRsync_5.4.1_x86_Free; ......
Installation
git clone https://github.com/kadnan/vagrant-spark2.git
cd vagrant-spark2
vagrant up
When the Vagrant provisioning/start-up processes are completed you can point your browser to:
http://localhost:8888
Provisioning will take at least 20minutes!
If you're behind proxy:
Install Proxy Configuration Plugin for Vagrant