gnusosa / hadoop-ds-workshop

Hadoop Data Science workshop

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hadoop-ds-workshop

This repository contains the files that are used for the Hadoop Data Science workshop, originated at the GOTO conference, 2015, Amsterdam

README.md
Meaning that you should read this
├── tutorial
	├── 01-IPython-notebook.ipynb
		Explore the possibilities of an IPython notebook
	├── 01-IPython-notebook-exercise.ipynb
		Play around with a notebook
	├── 02-Apache-Spark.ipynb
		Explore Spark for data processing, and see how it differs from regular Python
	├── 02-Apache-Spark-exercise.ipynb
	├── 02-Apache-Spark-solution.ipynb
		Load and process some data with Spark
	├── 03-Pandas.ipynb
		Explore Pandas for data processing and visualization
	├── 03-Pandas-exercise.ipynb
	├── 03-Pandas-solution.ipynb
		Pandas: DIY and enjoy!
	├── 04-Machine-Learning-example.ipynb
		Find your way in solving a simple problem using machine learning and Spark
	├── example_module.py
		This is how you create a module in Python
	├── fizzbuzz.csv
		Small dataset that is used throughout the notebooks
├── exploration
	Inspect these files if you have time and are up for a challenge!
	Here, you will predict for a question on stackexchange
	how many upvotes it will have received after a month,
	based on the number of upvotes in the first day.
	├── explore.ipynb
	├── plots.ipynb

About

Hadoop Data Science workshop


Languages

Language:Python 100.0%