ActiveConclusion / COVID19_mobility

COVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scraper of Mobility reports v 2.0

ActiveConclusion opened this issue · comments

Google recently published a mobility report with time-series in CSV format. You can download it on their website.
That means there's no need for a PDF file parser anymore. Due to that, I plan to change the concept of this repository.
Here are my points that I propose to implement here:

  1. Archive the PDF parser as part of the great history of this repository.
  2. Automatically download to this repository all available files (including PDF) on Google and Apple sites. If there are no problems with Google reports, the Apple website parser needs to be rewritten, because my ad-hoc solution does not work, unfortunately.
  3. Make one summary file from Google and Apple reports of the following structure:
country sub_region_1 sub_region_2 date retail grocery_and_pharmacy parks transit_stations workplaces residential walking driving transit
... ... ... ... ... ... ... ... ... ... ... ... ...
  1. Make a simple visualization app for this data (for example, using Bokeh library).

Feel free to offer your suggestions here.
Thank you!

Great! I will pull this into balefire.info for USA. The color shading was resolved as an FYI. I am using D3 and D3Plus for my visuals. The drawback is my visuals are USA obviously. I can look into the possibility of doing a global as it would involve mostly ignoring/reducing the data merges.

I got the google data in the system. It is pretty interesting. I do need to sit down with it some time but there are some pretty telling Pearson coefficients correlating with it. I also see that confirmed cases per 10k is higher when mobility data is higher. What I really should do is assess the log of that two weeks after to see if there is a correlation there. The graphs suggest that is the case. I will put a screen shot below showing 4/11 and one plot of AK. As an FYI, the university is doing a short article on the tool next week so hope to get more info on our data out there.

Screen Shot 2020-04-17 at 10 39 23 PM

@ladew222 Cool! I hope that I compile a summary file from Google and Apple reports in the next 2-3 days.

Wow cool.

I've recently made a couple of updates, so I summarize what's been done here:

  1. Everything related to PDF parser now is in the directory "scraper v 1.0".
  2. Apple report is now automatically downloaded to the repository every day. But with Google data now a little problem: if the CSV download is okay, the ability to download the PDF is now disabled, because the structure of Google webpage has significantly changed. But I think that's not a critical problem.
  3. Also, now automatically generated summary reports from Google and Apple data, which I mentioned above. They are available here. But some points should be noted here:
  • the matching of subregions from Google data with cities from Apple data needs to be further improved. Currently, they are matched as they are in the original data.
  • with the U.S. data is a serious problem because they are quite heterogeneous. So far, the cities are in the "sub_region_1" column. I think it is probably even better to remove the detailed breakdown by counties for the US from the summary report.
  • It is appropriate to adjust the baseline for Apple data for a longer period that intersects with the baseline Google period (e.g. January 13 to February 6). This is a rough approach, but I think it would be better than just taking the baseline for January 13th.
  1. Google Sheets are now updated automatically.

Also, it is necessary to think about the view of data visualization app, which would provide simple answers about the mobility situation in a particular region.

Cool. Here is the choropleth of residential mobility as it is now if you havent seen it.
Screen Shot 2020-04-20 at 9 01 17 PM

Wow, looks nice! But I couldn't reproduce this picture in your dashboard( I got it like this:
balefire

Maybe, I didn't press some button or checkbox?

My fault. It looks like Google does not have significant enough data for that metric. Retail and Recreation has the fuller map.

Got it, thanks! I suggest adding the ability to make a breakdown by states, it will allow us to see the picture throughout the United States.

Are you thinking about a map by states?

Yes

Last week's Update Digest:

  1. The problem with downloading Google PDF reports fixed (I fixed this problem a week ago, just didn't write here).
  2. Apple has added more regions/cities to their report. The main problem with it is that cities and subregions go without country names, but I have already fixed this issue (it was a challenging issue for me).
  3. With the addition of new data from Apple, there are now huge problems with the merging of reports, the scale of which I have not even assessed yet.

Latest updates:

  • Recently, Google Sheets with US detailed data crashed due to a large amount of data for one sheet. My apologies for everyone who used this spreadsheet as the source, currently you can use CSV version of this report. Maybe, I will reformat this Google Sheets (e.g. split the states by tabs), or, unfortunately, it will have to be abandoned forever.

  • Merging of Apple and Google reports significantly improved. I finally have made a matching table of subregions of Apple and Google. Also, I've split the summary report into several:

    1. Report by regions (without US counties)
    2. Report by countries (only totals)
    3. Report for the US only

    If someone sees errors in the matching table, please create the issue immediately.

  • Also, I think it's a good idea to add a geo-type column to Google data (such as in Apple report).

I haven't written anything here in a while, but I should have. So, point by point:

  • Until today, the parser has been working successfully for a long time automatically without my intervention. But strange things are happening on the Apple website today, so I predict the problems tonight.
  • I haven't fixed the problem with Google Sheets for the US yet, but there's already some progress.
  • Lately, I've been actively processing OpenSky COVID-19 Flight Dataset. I hope that within 2-3 days I will put my results in a separate repository. The main problem is that I do not understand the quality of data in this dataset and how to evaluate it. But it is what it is. If all goes well, I will also add these data to the merged reports.