SergiyKolesnikov / learning-getting-started-with-databricks

Notebooks for the O'Reilly course "Getting Started with Databricks" by Robert Ilijason

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting Started with Databricks

This repository contains the notebooks that I have created while following the O'Reilly course "Getting Started with Databricks: Tools for Understanding Massive Data Sets" by Robert Ilijason.

The course is a good basic hands-on introduction into the Databricks platform, a platform for data engineering, machine learning, and analytics. The course uses a simple machine learning project to demonstrate the capabilities of Databricks, but it is not an introduction into machine learning, data engineering, or analytics.

Notice that Chapters "Looking at the Data" and "Cleaning the Data" in the course are in the wrong order. "Cleaning the Data" should come first. My notebooks are ordered correctly.

You can explore the Databricks platform for free by signing up for the Community Edition. The Community Edition of the platform provides limited functionality, but it is still enough to get the first impression. You can import my notebooks into your Databricks workspace as described here.

The UCI wine quality dataset used in the notebooks can be downloaded here.

About

Notebooks for the O'Reilly course "Getting Started with Databricks" by Robert Ilijason

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 100.0%