reconhub / learn

RECON learn: a free, open platform for training material on epidemics analysis

Home Page:https://reconlearn.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stegen case study

thibautjombart opened this issue · comments

@AmmerB @zkamvar I am using this issue to discuss work on the Stegen case study. Plan is:

  • take down the current version, which nobody probably used; it is still present on the website but not linked from any page

  • make a new draft of the case study in a new branch, with rough narrative and all the analyses // Thibaut

  • solve technical issues (see below) // Zhian and Thibaut

  • polish narrative // Amrish

  • review code // Zhian

  • once we have the final post, generate files to distribute to Malta's participants (folder containing data, adapted scripts, Rmd, etc.) // Zhian

Technical issues currently include:

  • buttons don't display on the website, despite displaying fine when compiling using rmarkdown::render

  • add a table of content

buttons don't display on the website, despite displaying fine when compiling using rmarkdown::render

My answer: https://www.youtube.com/watch?v=5hQXSsbQCMs

once we have the final post, generate files to distribute to Malta's participants (folder containing data, adapted scripts, Rmd, etc.) // Zhian

Guaranteed to be in Malta and out of communication by then. I'll set up the initial materials as best as I can and then you can change them however you want before Malta.

The technical issues are not going to be resolved by the time we get this to Malta, but they are a bit minor. I'm going to put a checklist of things people have found by going through the practical:

From @finlaycampbell:

the code itself looks good to me and worked first try

  • line 335: remove "set" from "We set convert the unique identifiers"
  • line 693: add fullstop after readRDS()

I found a few minor typos in the Stegen practical:

Data exploration - Going Further

  • line 497: "summaries each strata" -> "summarise each strata" (or stratum?)

Risk ratios

  • line 746: "inforamtion" -> "information"

Multiple variables - Going Further

  • line 1059: "calculcate" -> "calculate"
  • line 1093: "agrument" -> argument

Conclusion

  • lines 1222-1223: "suffisticated mapping and spatial mathodologies" -> "sophisticated mapping and spatial methodologies"

Some more minor comments:

Loading required packages

  • dplyr: put in italic, and add links to some relevant pages for the packages; code provided below:

  • here: to find the path to data or script files

  • readxl: to read Excel spreadsheets into R

  • readr: to write (and read) spreadsheets as text files

  • incidence: to build epicurves

  • epitrix: to clean labels from our spreadsheet

  • dplyr: to help with factors

  • ggplot2: to create custom visualisations

  • epitools: to calculate risk ratios

  • sf: To read in shape files

  • leaflet: to demonstrate interactive maps

Colors used in graphics

  • replace colors for 'cases'; currently the colors for 'cases' and 'females' are very similar. Replace the color for cases to #993333:
plot(i_ill, color = c("non case" = "#66cc99", "case" = "#993333"))`

on this line and everywhere else.

  • the viridis color legend for p-values may be better off on log10 scale; don't do it if too much of a hassle though

  • on all maps, use the same colors for cases / non cases as elsewhere; might be a case of just adding this to the graphs:

 + scale_color_manual("Illness", values = c("non case" = "#66cc99", "case" = "#993333"))
  • the same could be done with leaflet, but again, not worth it if it can't be done in less than 20 min

Here are a couple of suggestions for slightly changing bits of wording in the practical:

  • Risk ratios - Going Further: Univariate tests the numbering of methods of testing is wrong - it should be 1 and 2, not 2 and 3.

  • Risk ratios - Going Further: Is illness linked to gender? make sure that case/non case is on the top of the 2x2 table and the other variable is on the side. The current way is not wrong mathematically, but it is unconventional for epidemiological analysis and the other 2x2 tables in the practical are the other way around.
    ie. instead of:
    tab_ill_sex <- table(stegen$ill, stegen$sex)
    it would be preferable to say
    tab_ill_sex <- table(stegen$sex, stegen$ill)

In addition to @jeskarp's comments, It would also be good to keep everything as a single data frame, subsetting where needed. Currently, to test the risk ratios for each food item, we use separate the food items into a new data frame called food. Several students were a bit confused/skeptical of this because Stata apparently likes to re-arrange data when you use the keyword "keep" (I wasn't quite clear on that part).

I'm going to close this. The technical issues brought up involve a lot more debugging of css than any of us are comfortable with