sheelc / dbaisummit-2021-json-talk

Code examples from Databricks AI Summit 2021 JSON talk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Databricks AI Summit 2021 JSON talk

This repository contains the code samples for the Databricks AI Summit 2021 JSON talk titled "Eliminating the JSON Tax in Apache Spark", aimed for beginner-level Spark developers.

Slides are available on Google Slides.

The notebook called profile_json includes code showing the built in Spark options for automatic schema inference and manual schema option. After that, the notebook includes a UDF for profiling JSON to help inform explicit schemas with awareness of schema drift.

About

Code examples from Databricks AI Summit 2021 JSON talk


Languages

Language:Jupyter Notebook 92.1%Language:Python 7.9%