Lecturer: Jonas Schöley (j.schoeley@uni-rostock.de)
University of Rostock, Summer 2022
Please install the current versions of
- R (https://cran.r-project.org/) and
- Rstudio (https://www.rstudio.com/products/rstudio/download/).
- Thu Apr 14, 09-11: Probabilities of Survival
- Lecture slides
- Lecture R code
- R code for the interactive distribution plot
- Homework Choose a time-to-event setting that interests you and look up a constant rate related to that setting. What is the time scale for your setting? When does the time-to-event start? When have have half of the population experienced the event given the chosen rate?
- Recap probability densities
- Recap random variables
- Recap calculus
- Thu Apr 21, 09-11: From Data to Distribution
- Lecture slides
- Lecture R code
- Breastcancer data
- R code for the interactive likelihood demonstration
- Homework Ensure that you have a working installation of R and RStudio. Make a new folder for this course. Associate this folder with an RStudio project. Write and save an R script that loads the
breastcancer
data into the R session.
- Thu Apr 28, 09-11: Incomplete Observations
- Thu May 12, 09-11: Trial Exam
- Thu May 19, 09-11: Exam
- Thu June 02, 09-11: The Kaplan-Meier Estimator
- Lecture slides
- Lecture R code
- Breastcancer data
- Cheat sheets are short introductions to certain topics:
- Homework Using R, produce a Kaplan-Meier plot related to the topic of your eventual seminar paper ("Hausarbeit"). You don’t need to have all the data for your topic yet, but you need to find “some” related data. Be prepared to present your plot to the group. Think about study time start and end, event of interest, and censoring. You may compare multiple groups, but a KM-plot for a single group is fine as well.
- Thu June 16, 09-11: The Logrank Test
- Thu June 23, 09-11: The Cox Proportional Hazards Regression
- Thu June 30, 09-11: Effect modification
- Thu July 07, 09-11: [Student Presentations]
- Homework Please prepare a 10 minute presentation outlining your plan for the term paper. What is the topic? What research questions do you want to answer? What data set do you plan on using? Do you already have some preliminary results? If so, show them. What difficulties did you encounter thus far?
At the end of the seminar you will write a term paper. I'd like you to answer some research questions with the methods of survival analysis using a dataset and topic of your choice. Here are some basic examples:
- Topic: "Educational differences in timing of first Cannabis use"
- Q: Are there educational differences in the share of people who consumed Cannabis at least once by the age of 20? (Kaplan-Meier; Logrank test)
- Q: Do the educational differences in the risk of first use of Cannabis change across cohort? (Cox regression with cohort-education interaction)
- Q: Among those who have consumed Cannabis at least once by the age of 20, is there an educational difference in the timing of first use? (Kaplan-Meier, Logrank test)
- Data: I've heard some of you used a data set like this in a seminar on logistic regression.
- Topic: "The role of stage at diagnosis in explaining differences in lung cancer survival by ethnicity"
- Q: Are lung cancer cases in black patients diagnosed at a later stage compared to non-black patients? (Chi-squared test)
- Q: Does the risk of death following lung cancer vary by ethnicity? (Kaplan-Meier, Logrank test, Cox regression)
- Q: How much does the risk of death following lung cancer vary by stage of diagnosis? (Kaplan-Meier, Logrank test, Cox regression)
- Q: Is the "black disadvantage" of lung cancer survival by stage of diagnosis smaller than the overall "black disadvantage"? (Kaplan-Meier, Logrank test, Cox regression with and without stage by ethnicity interaction)
- Data: SEER.
- Topic: "Cohort differences in the timing leaving the parental home"
- Q: How did the median age of leaving the parental home evolve over the birth cohorts 1973 to 1993? (Kaplan-Meier, Logrank test)
- Q: Is the sex difference in the timing of leaving the parental home diverging or converging across cohorts? (Cox regression with cohort-sex interaction)
- Data: Pairfam.
The term paper should be around 20 pages in total and feature all the usual parts of an empirical study, e.g.
- introduction
- background
- research questions & hypotheses
- methods
- data
- results
- discussion
- references
The idea is for you to do a sort of "mini" masters thesis -- a small independent, empirical research project. You may have a look at my master's thesis for guidance. You may write in German or English. Send the finished term paper to j.schoeley@uni-rostock.de.
- Survival analysis:
- Klein & Moeschberger (2003). Survival Analysis is a classic textbook on survival analysis.
- Harrel (2015). Regression modeling strategies is a great reference for all regression related topics. It's chapters on Survival Analysis feature R code and are very approachable.
- Kaplan & Meier (1958). Nonparametric Estimation from Incomplete Observations. The most cited statistics paper and one of the fundamental methods of survival analysis.
- Cox (1958) Regression Models and Life Tables. The second most cited statistics paper and the other pillar of survival analysis.
- Data visualization:
- Claus Wilke: Fundamentals of Data Visualization for general principles of data visualization.
- Hadley Wickham: ggplot2 for the introduction into a very flexible and popular R plotting package. Well worth learning.