plotly / datasets

Datasets used in Plotly examples and documentation

Home Page:https://plotly.github.io/datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

geojson-counties-fips..json has state-sized holes in it

FlorinAndrei opened this issue · comments

from urllib.request import urlopen
import json
import requests
import os
import pandas as pd
import plotly.express as px

with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

tsfile = 'time_series_covid19_confirmed_US.csv'
tsurl = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/' + tsfile

if not os.path.exists(tsfile):
    req = requests.get(tsurl)
    with open(tsfile, 'wb') as f:
        f.write(req.content)
ts = pd.read_csv(tsfile)

ts.dropna(inplace=True)
ts = ts[ts['FIPS'] < 80000].copy(deep=True)

ts_short = ts[['FIPS', '5/9/20', '5/10/20']].copy(deep=True)
ts_short['delta'] = ts_short['5/10/20'] - ts_short['5/9/20']
ts_short = ts_short[ts_short['delta'] >= 0].copy(deep=True)
dmin = ts_short['5/10/20'].min()
dmax = ts_short['5/10/20'].max()

fig = px.choropleth(ts_short, geojson=counties, locations='FIPS', color='5/10/20',
                           color_continuous_scale="Viridis",
                           range_color=(dmin, dmax),
                           scope="usa"
                          )

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

fig.show()

This is rendered with holes:

map with holes

But the same data with create_choropleth() works fine:

import plotly.figure_factory as pff
fig2 = pff.create_choropleth(fips=ts_short['FIPS'], values=ts_short['5/10/20'])
fig2.show()

good map

I would prefer to use px.choropleth() because it seems easier to customize, but this bug completely breaks it.

Plotly 4.6.0
Python 3.7.7
Jupyter notebook
Anaconda
Windows

So this dataset is used in the example here https://plotly.com/python/choropleth-maps/#choropleth-map-using-geojson and it works OK. I suspect the problem is that your FIPS column is a number rather than a string, and hence the FIPS codes that have a leading zero are not being matched. In the doc link above we explicitly cast the FIPS column to a string to avoid this problem.

Looks like Plotly actually expects those codes to be strings left-padded with zeros!

That field in the COVID data is originally a float-like string with a single digit after the dot, a.k.a. 1429.0

So I did a bit of data juggling: I load it as float, cast it as int, cut off bad values, cast it as str, left-pad it with 0 then use it. And then it works - almost. Here's the full code:

from urllib.request import urlopen
import json
import requests
import os
import pandas as pd
import plotly.express as px

geofile = 'geojson-counties-fips.json'
geourl = 'https://raw.githubusercontent.com/plotly/datasets/master/' + geofile
if not os.path.exists(geofile):
    req = requests.get(geourl)
    with open(geofile, 'wb') as f:
        f.write(req.content)

with open(geofile) as f:
    counties = json.load(f)

tsfile = 'time_series_covid19_confirmed_US.csv'
tsurl = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/' + tsfile

if not os.path.exists(tsfile):
    req = requests.get(tsurl)
    with open(tsfile, 'wb') as f:
        f.write(req.content)

ts = pd.read_csv(tsfile, dtype={"FIPS": float})
ts.dropna(inplace=True)
ts['FIPS'] = ts['FIPS'].astype('int64', copy=True)
ts = ts[ts['FIPS'] < 80000].copy(deep=True)
ts['FIPS'] = ts['FIPS'].astype('str', copy=True)

ts['FIPS'] = ts['FIPS'].str.rjust(5, '0')

ts_short = ts[['FIPS', '5/9/20', '5/10/20']].copy(deep=True)
ts_short['delta'] = ts_short['5/10/20'] - ts_short['5/9/20']
ts_short = ts_short[ts_short['delta'] >= 0].copy(deep=True)

dmin = ts_short['5/10/20'].min()
dmax = ts_short['5/10/20'].max()

fig = px.choropleth(ts_short, geojson=counties, locations='FIPS', color='5/10/20',
                           color_continuous_scale="Inferno",
                           range_color=(dmin, dmax),
                           scope="usa"
                          )

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

fig.show()

new map

There are still some missing counties.

Anyway, the older create_choropleth() will happily take everything and not complain about it. That's what was confusing. Even the remaining missing counties on this map do not appear as holes in the old map.

You should probably indicate in the docs that this library expects left-padded strings for FIPS.

There's still something going on with those little missing chunks.

It's actually not that the library expects left-padded strings, it's that the IDs in the GeoJSON file are strings with left-padded zeros. If you had a different GeoJSON file of your own without the zeros then you'd have to omit them :) The missing counties are probably just badly coded in the GeoJSON file. This file is just one we use for our docs, and not meant as a reference for counties by FIPS code... if you want to source and use a different one that meets your needs better, I would recommend doing that.

There's nothing wrong with your GeoJSON, sorry for the trouble.

Looks like the old create_choropleth(), if data was missing, was still somehow rendering those counties as if data was zero (I think).

Whereas px.choropleth() does not render anything if the FIPS entry is missing from the data, even if that FIPS area exists in GeoJSON. The holes in the map were due to missing FIPS from my data, and the difference in behavior for the newer function.

To fix it, all I had to do was this:

# if a FIPS is missing from the data, backfill it from GeoJSON
# or else the map will have holes in it
# extract all known FIPS from GeoJSON
allcodes = [c['id'] for c in counties['features']]
# now look for missing FIPS in data, and backfill in
for c in allcodes:
    if c not in ts_short['FIPS'].tolist():
        ts_short = ts_short.append({'FIPS': c, '5/9/20': 0, '5/10/20': 0, 'delta': 0}, ignore_index=True)

And now it works just fine:

good map

Thank you for all the explanations!

OK, glad you got things working!