This repository contains the code samples for the Databricks AI Summit 2021 JSON talk titled "Eliminating the JSON Tax in Apache Spark", aimed for beginner-level Spark developers.
Slides are available on Google Slides.
The notebook called profile_json
includes code showing the built in Spark options for automatic schema inference and manual schema option. After that, the notebook includes a UDF for profiling JSON to help inform explicit schemas with awareness of schema drift.