mhassan2 / datafabric_splunk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

📘 Full tutorial:


The purpose of this docker container is to correlate the components that make up the Splunk data fabric embrace initiative process. The practitioners can now immediately try and experience the power of a Splunk integration with different external software components like Hadoop, RDBMS, Kafka and Nifi, with the ability to search, visualize and analyze the pre-populated data. There is no hassle of setting them up separately!

Get a copy of this repo on your local drive (optional):

Prerequisites (Mac OSX):

  1. Install docker and allocate all available memory and CPU to docker daemon: > preference -> advance -> slide CPU and Memory line all the way to the right -> apply & restart

  2. Increase your docker storage pool from 10G to 20G (this is distrucive and will delete all volumes, containers and images)

cd ~/Library/Containers/com.docker.docker/Data/database/
git reset --hard

cat com.docker.driver.amd64-linux/disk/size

Number is in MB, so 20G should be 20971520:

echo 20971520 > com.docker.driver.amd64-linux/disk/size
git add com.docker.driver.amd64-linux/disk/size
git commit -s -m 'New target disk size'


rm ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
Make sure to restart docker.

There's no OSX UI support for this change at this point. For Linux change follow instructions here:


  • All passwords in this tutorial are preset to “splunk123” (applies to everything).
  • Please to not disable sshd, hadoop uses rsync to communicate with the nodes.

Pre-installed and pre-configured packages:

Pre-loaded datasets:

Docker commands:


-The standard practice is:      docker exec DF01 /bin/bash
-Also using ssh on port 2122:   ssh -p 2122 root@localhost

To copy files to container: docker cp localfilename DF01:/tmp

To start a container: docker start DF01

To create a container (first time will take ~5 mins while pulling image:

docker run -d --name=DF01 --hostname=DF01 -p 2122:22 -p 8000:8000 -p 8088:8088 -p 8188:8188 -p 10020:10020 -p 9090:9090 -p 50070:50070  splunknbox/splunk_datafabric

If you dont provide the environmental vars with the run command it will assume it is set to "YES".

To prevent a service from staring set the var to "NO". Example, run all services except MySQL:

time docker run -d --name=DF01 --hostname=DF01 -p 2122:22 -p 8000:8000 -p 8088:8088 -p 8188:8188 -p 10020:10020 -p 9090:9090 -p 50070:50070 -e MYSQL="NO"  splunknbox/splunk_datafabric

Available vars you can use with docker run command:


Docker run command shows all ports for external services. To make more ports visible outside the container consult EXPOSE statements in Dockerfile.

Exeternally available web services :

http://localhost:8000		splunk (wouldn't have any other way!)<br>
http://localhost:50070	   	Hadoop (yarn)<Br>
http://localhost:9090		Apache Nifi<br>
http://localhost:8088		Hadoop<Br>


The following inviduals from Splunk for creating this tutorial and configuring splunk

Rannan Dagan<br>
Scott Haskell<Br>


If like to build the image from scratch you can use my script Please be aware once the image is created locally it will not pull it from my hub repositoy (splunknbox). To do that you must manually delete your created image (docker rmi splunk_datafabric).

Document last update:
$VERSION: [v1.1-8] $
$DATE: [Wed Dec 27,2017 - 06:18:00PM -0600] $
$AUTHOR: [mhassan2] $


License:Apache License 2.0


Language:Python 88.3%Language:Shell 6.5%Language:JavaScript 2.2%Language:Ruby 1.5%Language:HTML 1.1%Language:CSS 0.4%Language:Batchfile 0.0%