Stegen case study

Question

Stegen case study

thibautjombart opened this issue 6 years ago · comments

Thibaut Jombart commented 6 years ago

@AmmerB @zkamvar I am using this issue to discuss work on the Stegen case study. Plan is:

take down the current version, which nobody probably used; it is still present on the website but not linked from any page
make a new draft of the case study in a new branch, with rough narrative and all the analyses // Thibaut
solve technical issues (see below) // Zhian and Thibaut
polish narrative // Amrish
review code // Zhian
once we have the final post, generate files to distribute to Malta's participants (folder containing data, adapted scripts, Rmd, etc.) // Zhian

Thibaut Jombart · Answer 1 · Fri Oct 26 2018 17:22:18 GMT+0800 (China Standard Time)

Technical issues currently include:

buttons don't display on the website, despite displaying fine when compiling using rmarkdown::render
add a table of content

Zhian N. Kamvar · Answer 2 · Tue Nov 06 2018 19:21:20 GMT+0800 (China Standard Time)

buttons don't display on the website, despite displaying fine when compiling using rmarkdown::render

My answer: https://www.youtube.com/watch?v=5hQXSsbQCMs

Zhian N. Kamvar · Answer 3 · Tue Nov 06 2018 19:26:16 GMT+0800 (China Standard Time)

once we have the final post, generate files to distribute to Malta's participants (folder containing data, adapted scripts, Rmd, etc.) // Zhian

Guaranteed to be in Malta and out of communication by then. I'll set up the initial materials as best as I can and then you can change them however you want before Malta.

Zhian N. Kamvar · Answer 4 · Wed Nov 14 2018 22:40:07 GMT+0800 (China Standard Time)

The technical issues are not going to be resolved by the time we get this to Malta, but they are a bit minor. I'm going to put a checklist of things people have found by going through the practical:

From @finlaycampbell:

the code itself looks good to me and worked first try

line 335: remove "set" from "We set convert the unique identifiers"
line 693: add fullstop after readRDS()

jeskarp · Answer 5 · Thu Nov 15 2018 00:21:21 GMT+0800 (China Standard Time)

I found a few minor typos in the Stegen practical:

Data exploration - Going Further

line 497: "summaries each strata" -> "summarise each strata" (or stratum?)

Risk ratios

line 746: "inforamtion" -> "information"

Multiple variables - Going Further

line 1059: "calculcate" -> "calculate"
line 1093: "agrument" -> argument

Conclusion

lines 1222-1223: "suffisticated mapping and spatial mathodologies" -> "sophisticated mapping and spatial methodologies"

Thibaut Jombart · Answer 6 · Thu Nov 15 2018 19:51:08 GMT+0800 (China Standard Time)

Some more minor comments:

Loading required packages

dplyr: put in italic, and add links to some relevant pages for the packages; code provided below:
here: to find the path to data or script files
readxl: to read Excel spreadsheets into R
readr: to write (and read) spreadsheets as text files
incidence: to build epicurves
epitrix: to clean labels from our spreadsheet
dplyr: to help with factors
ggplot2: to create custom visualisations
epitools: to calculate risk ratios
sf: To read in shape files
leaflet: to demonstrate interactive maps

Colors used in graphics

replace colors for 'cases'; currently the colors for 'cases' and 'females' are very similar. Replace the color for cases to #993333:

plot(i_ill, color = c("non case" = "#66cc99", "case" = "#993333"))`

on this line and everywhere else.

the viridis color legend for p-values may be better off on log10 scale; don't do it if too much of a hassle though
on all maps, use the same colors for cases / non cases as elsewhere; might be a case of just adding this to the graphs:

 + scale_color_manual("Illness", values = c("non case" = "#66cc99", "case" = "#993333"))

the same could be done with leaflet, but again, not worth it if it can't be done in less than 20 min

jeskarp · Answer 7 · Thu Nov 22 2018 21:13:31 GMT+0800 (China Standard Time)

Here are a couple of suggestions for slightly changing bits of wording in the practical:

Risk ratios - Going Further: Univariate tests the numbering of methods of testing is wrong - it should be 1 and 2, not 2 and 3.
Risk ratios - Going Further: Is illness linked to gender? make sure that case/non case is on the top of the 2x2 table and the other variable is on the side. The current way is not wrong mathematically, but it is unconventional for epidemiological analysis and the other 2x2 tables in the practical are the other way around.
ie. instead of:
tab_ill_sex <- table(stegen$ill, stegen$sex)
it would be preferable to say
tab_ill_sex <- table(stegen$sex, stegen$ill)

Zhian N. Kamvar · Answer 8 · Thu Nov 22 2018 23:18:50 GMT+0800 (China Standard Time)

In addition to @jeskarp's comments, It would also be good to keep everything as a single data frame, subsetting where needed. Currently, to test the risk ratios for each food item, we use separate the food items into a new data frame called food. Several students were a bit confused/skeptical of this because Stata apparently likes to re-arrange data when you use the keyword "keep" (I wasn't quite clear on that part).

Zhian N. Kamvar · Answer 9 · Sat Dec 08 2018 16:26:14 GMT+0800 (China Standard Time)

I'm going to close this. The technical issues brought up involve a lot more debugging of css than any of us are comfortable with