Databricks AI Summit 2021 JSON talk

This repository contains the code samples for the Databricks AI Summit 2021 JSON talk titled "Eliminating the JSON Tax in Apache Spark", aimed for beginner-level Spark developers.

Slides are available on Google Slides.

The notebook called profile_json includes code showing the built in Spark options for automatic schema inference and manual schema option. After that, the notebook includes a UDF for profiling JSON to help inform explicit schemas with awareness of schema drift.

About

Code examples from Databricks AI Summit 2021 JSON talk

Languages

Language:Jupyter Notebook 92.1%Language:Python 7.9%