lch14forever / covid19_google_mobility

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extracting data from google’s COVID-19 community mobility reports

Google has recently published reports on the mobilty data for different contires/regions. However, I could not find the raw data behind the plots. Here is my attemp to recover the raw data.

Downloading the report and extract the data graphs

This is done using shell commands.

Process one example image in R (Grocery & pharmacy in Singapore)

Load libraries and an example image

library(grid)
library(dplyr)
library(png)
library(ggplot2)
img <- readPNG("2020-03-29_SG_Mobility_Report_en.2.png")

We can extract the color with a digital color meter:

  • Graph color rgb(81, 135,237) or rgb(0.32, 0.53, 0.93)
  • Grid color rgb(216, 218, 222) or rgb(0.85, 0.85, 0.87)

Find the graph

delta <- 0.08

graph.sel <- abs(img[,,1] - 0.32) < delta & abs(img[,,2] - 0.53) < delta & abs(img[,,3] - 0.93) < delta

## Validate the selection
img[,,1][graph.sel] <- 1
img[,,2][graph.sel] <- 0
img[,,3][graph.sel] <- 0

grid.raster(img[,,])

Find the grid

delta <- 0.08
grid.sel <- abs(img[,,1] - 0.85) < delta & abs(img[,,2] - 0.85) < delta & abs(img[,,3] - 0.85) < delta

img[,,1][grid.sel] <- 0
img[,,2][grid.sel] <- 1
img[,,3][grid.sel] <- 0

grid.raster(img[,,])

Find y values

gridlines.y <- data.frame(rows=which(grid.sel[, round(dim(grid.sel)[2]/2)])) %>% 
    mutate(diff=rows-lag(rows)) %>% 
    filter(diff >2 | is.na(diff)) %>% 
    pull(rows)
zero <- (gridlines.y[1] + last(gridlines.y))/2
scale <- 1.6/abs(gridlines.y[1] - last(gridlines.y))

y <- (zero-apply(graph.sel,2,function(x) mean(which(x))) )* scale

# check extracted data shape
plot(y)

Find x values

## find position of the ticks
tmp <- grid.sel[last(gridlines.y):(last(gridlines.y) + 10),] %>% colSums()

ticks.x <- data.frame(cols=which(tmp > 5)) %>% 
    mutate(diff=cols-lag(cols)) %>% 
    filter(diff >2 | is.na(diff)) %>% pull(cols) 

step <- (last(ticks.x)-ticks.x[1])/42## 43 days
x.sel <- round(seq(ticks.x[1] +1, length=43, by=step))

extracted <- data.frame(Date=seq(as.Date("2020/2/16"), as.Date("2020/3/29"), by=1),
           Relative_mobility=y[x.sel])

## You can now save this as a csv or plain text
head(extracted)
##         Date Relative_mobility
## 1 2020-02-16       -0.15384615
## 2 2020-02-17       -0.06993007
## 3 2020-02-18       -0.09230769
## 4 2020-02-19       -0.13706294
## 5 2020-02-20       -0.12867133
## 6 2020-02-21       -0.12307692

Let’s plot

ggplot(extracted, aes(x=Date, y=Relative_mobility*100)) + 
    geom_area(fill='grey') + 
    geom_line(lwd=2) + 
    theme_bw() 

Session information

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.15.4
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] ggplot2_3.2.1 png_0.1-7     dplyr_0.8.3  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2       knitr_1.20       magrittr_1.5     munsell_0.5.0   
##  [5] tidyselect_0.2.5 colorspace_1.4-1 R6_2.4.0         rlang_0.4.0     
##  [9] stringr_1.4.0    tools_3.5.1      gtable_0.3.0     withr_2.1.2     
## [13] htmltools_0.3.6  lazyeval_0.2.2   yaml_2.2.0       rprojroot_1.3-2 
## [17] digest_0.6.22    assertthat_0.2.1 tibble_2.1.3     crayon_1.3.4    
## [21] purrr_0.3.3      glue_1.3.1       evaluate_0.12    rmarkdown_1.10  
## [25] labeling_0.3     stringi_1.4.3    compiler_3.5.1   pillar_1.4.2    
## [29] scales_1.0.0     backports_1.1.5  pkgconfig_2.0.3

About