Working with Known JSON Schemas - Lab

Introduction

In this lab, you'll practice working with JSON files whose schema you know beforehand.

Objectives

You will be able to:

Use the JSON module to load and parse JSON documents
Extract data using predefined JSON schemas
Convert JSON to a pandas dataframe

Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:

or a fully expanded view:

You can more about the documentation here.

Note that this is a different schema than the schema used in the previous lesson, although both come from the New York Times.

Loading the JSON Data

Open the JSON file located at ny_times_movies.json, and use the json module to load the data into a variable called data.

# Your code here

Run the code below to investigate its contents:

# Run this cell without changes
print("`data` has type", type(data))
print("The keys are", list(data.keys()))

Loading Results

Create a variable results that contains the value associated with the 'results' key.

# Your code here

Below we display this variable as a table using pandas:

# Run this cell without changes
import pandas as pd
df = pd.DataFrame(results)
df

Data Analysis

Now that you have a general sense of the data, answer some questions about it.

How many results are in the file?

The metadata says this:

# Run this cell without changes
data['num_results']

Double-check that by looking at results. Does it line up?

# Your code here

"""
Your written answer here
"""

How many unique critics are there?

A critic's name can be identified using the 'byline' key. Assign your answer to the variable unique_critics.

# Your code here

This code checks your answer.

# Run this cell without changes
assert unique_critics == 7

Flattening Data

Create a list review_urls that contains the URL for each review. This can be found using the 'url' key nested under 'link'.

# Your code here (create more cells as needed)

The following code will check your answer:

# Run this cell without changes

# review_urls should be a list
assert type(review_urls) == list

# The length should be 20, same as the length of reviews
assert len(review_urls) == 20

# The data type contained should be string
assert type(review_urls[0]) == str and type(review_urls[-1]) == str

# Spot checking a specific value
assert review_urls[6] == 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html'

Summary

Well done! In this lab you continued to practice extracting and transforming data from JSON files with known schemas.

learn-co-students / dsc-working-with-known-json-schemas-lab-seattle-ds-080519