Merging multiple files

Question

Merging multiple files

andkov opened this issue 8 years ago · comments

The dto as it emerges from the 1-scale-assembly has a data frame as each element. Each data frame is long with respect to time. Each data set has variables year and hhidpn.

Objective

prodice a single, flat data set that combines those in the list

Andriy V. Koval · Answer 1 · Fri Mar 10 2017 05:02:05 GMT+0800 (China Standard Time)

@casslbrown

First, we need to create a subset of the list dto that would contain ONLY the items we want to merge:

dto_new <- list()

dto_new[["demographics"]] <- dto$demographics %>% 
  dplyr::select(year, hhidpn, birthyr, interview_yr,male, race )

dto_new[["loneliness"]] <- dto$loneliness %>% 
  dplyr::select(year,hhidpn,score_loneliness_3, score_loneliness_11  )

dto_new[["life_satisfaction"]] <- dto$life_satisfaction %>% 
  dplyr::select(year, hhidpn, sum, mean) %>% 
  dplyr::rename(
     life_sat_sum = sum
    ,life_sat_mean = mean
  )

The script above shows how to create a dto_new that would have the same subsection, but but would prune the unnecessary item. Where, necessary, you must rename the columns, so that their name are unique in the global file (e.g. if more than one scale has columns sum and mean you must be give it unique names, e.g. life_satisfaction_sum' and life_satifsfaction_mean`)

The snapshot of the produced dataframes is displayed below.

Notice, how each data frame has two identical columns : year and hhidpn. This is important becaue we will join these data frames by these columns. Use the following function to merge multiple data frames by the same key:

merge_mulitple_files <- function(list, by_columns){
  Reduce(function( d_1, d_2 ) merge(d_1, d_2, by=by_columns), list)
}
ds <- merge_mulitple_files(dto_new, by_columns = c("year","hhidpn"))

The snapshot of the created data frame appears below.