IALSA / HRS

Shaping data from the Health and Retirement Study.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Merging multiple files

andkov opened this issue · comments

The dto as it emerges from the 1-scale-assembly has a data frame as each element. Each data frame is long with respect to time. Each data set has variables year and hhidpn.

Objective

prodice a single, flat data set that combines those in the list

@casslbrown

First, we need to create a subset of the list dto that would contain ONLY the items we want to merge:

dto_new <- list()

dto_new[["demographics"]] <- dto$demographics %>% 
  dplyr::select(year, hhidpn, birthyr, interview_yr,male, race )

dto_new[["loneliness"]] <- dto$loneliness %>% 
  dplyr::select(year,hhidpn,score_loneliness_3, score_loneliness_11  )

dto_new[["life_satisfaction"]] <- dto$life_satisfaction %>% 
  dplyr::select(year, hhidpn, sum, mean) %>% 
  dplyr::rename(
     life_sat_sum = sum
    ,life_sat_mean = mean
  )

The script above shows how to create a dto_new that would have the same subsection, but but would prune the unnecessary item. Where, necessary, you must rename the columns, so that their name are unique in the global file (e.g. if more than one scale has columns sum and mean you must be give it unique names, e.g. life_satisfaction_sum' and life_satifsfaction_mean`)

The snapshot of the produced dataframes is displayed below.
image

Notice, how each data frame has two identical columns : year and hhidpn. This is important becaue we will join these data frames by these columns. Use the following function to merge multiple data frames by the same key:

merge_mulitple_files <- function(list, by_columns){
  Reduce(function( d_1, d_2 ) merge(d_1, d_2, by=by_columns), list)
}
ds <- merge_mulitple_files(dto_new, by_columns = c("year","hhidpn"))

The snapshot of the created data frame appears below.
image