pederknorr / dsc-json-v2-1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JSON

Introduction

In this lecture, you'll continue investigating new formats for data. Specifically, you'll investigate one of the most popular data formats for the web: JSON files.

Objectives

You will be able to:

  • Use the JSON module to load and parse JSON documents

JSON

JSON stands for JavaScript Object Notation. When it was first introduced, JSON files were meant to streamline many data transportation issues at the time. It is now the common standard amongst data transfers on the web and has numerous parsing packages for numerous languages (including Python)!

Here's a brief preview of a JSON file:

Loading JSON Data

Prebuilt Python modules exist that will give you a powerful starting point for accessing and manipulating the underlying data in JSON files. We will work with the json module.

The JSON Module

https://docs.python.org/3.6/library/json.html

import json

To load a json file, you first open the file using python's built-in function and then pass that file object to the json module's load method. As you can see, this loaded the data as a dictionary.

f = open('nyc_2001_campaign_finance.json')
data = json.load(f)
print(type(data))
<class 'dict'>

Json files are often nested in a hierarchical structure and will have data structures analogous to python dictionaries and lists. You can begin to investigate a particular file by using our traditional python methods. Here's all of the built-in supported data types in JSON and their counterparts in python:

Check the keys of the dictionary:

data.keys()
dict_keys(['meta', 'data'])

Investigate what data types are stored within the values associated with those keys:

for v in data.values():
    print(type(v))
<class 'dict'>
<class 'list'>

You can quickly preview the first dictionary as a DataFrame, but first we need to import the pandas package.

import pandas as pd
pd.DataFrame.from_dict(data['meta'])
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
view
attribution Campaign Finance Board (CFB)
averageRating 0
category City Government
columns [{'id': -1, 'name': 'sid', 'dataTypeName': 'me...
createdAt 1315950830
description A listing of public funds payments for candida...
displayType table
downloadCount 1470
flags [default, restorable, restorePossibleForType]
grants [{'inherited': False, 'type': 'viewer', 'flags...
hideFromCatalog False
hideFromDataJson False
id 8dhd-zvi6
indexUpdatedAt 1536596254
metadata {'rdfSubject': '0', 'rdfClass': '', 'attachmen...
name 2001 Campaign Payments
newBackend False
numberOfComments 0
oid 4140996
owner {'id': '5fuc-pqz2', 'displayName': 'NYC OpenDa...
provenance official
publicationAppendEnabled False
publicationDate 1371845179
publicationGroup 240370
publicationStage published
query {}
rights [read]
rowClass
rowsUpdatedAt 1371845177
rowsUpdatedBy 5fuc-pqz2
tableAuthor {'id': '5fuc-pqz2', 'displayName': 'NYC OpenDa...
tableId 932968
tags [finance, campaign finance board, cfb, nyccfb,...
totalTimesRated 0
viewCount 233
viewLastModified 1536605717
viewType tabular

Notice the column names which will be very useful!

Investigate further information about the list stored under the 'data' key:

len(data['data'])
285

Previewing the first entry:

data['data'][0]
[1,
 'E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1',
 1,
 1315925633,
 '392904',
 1315925633,
 '392904',
 '{\n  "invalidCells" : {\n    "1519001" : "TOTALPAY",\n    "1518998" : "PRIMARYPAY",\n    "1519000" : "RUNOFFPAY",\n    "1518999" : "GENERALPAY",\n    "1518994" : "OFFICECD",\n    "1518996" : "OFFICEDIST",\n    "1518991" : "ELECTION"\n  }\n}',
 None,
 'CANDID',
 'CANDNAME',
 None,
 'OFFICEBORO',
 None,
 'CANCLASS',
 None,
 None,
 None,
 None]

Summary

As you can see, there's still a lot going on here with the deeply nested structure of JSON data files. In the upcoming lab, you'll get a chance to practice loading files and conducting some initial preview of the data as you did here.

About

License:Other


Languages

Language:Jupyter Notebook 100.0%