New changes in tabix_region break existing codes
hsun3163 opened this issue · comments
The new commit c146a06 break the existing prepare_data_list
function with the following error:
Caused by error in `.f()`:
! No common complete samples between genotype and phenotype/covariate data
Backtrace:
▆
1. ├─pecotmr::load_regional_functional_data(...)
2. │ └─pecotmr:::load_regional_association_data(...)
3. │ └─pecotmr:::prepare_data_list(...)
4. │ └─... %>% ...
5. ├─dplyr::select(...)
6. ├─dplyr::mutate(...)
7. ├─dplyr:::mutate.data.frame(...)
8. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
9. │ ├─base::withCallingHandlers(...)
10. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
11. │ └─mask$eval_all_mutate(quo)
12. │ └─dplyr (local) eval()
13. ├─purrr::map2(...)
14. │ └─purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
15. │ ├─purrr:::with_indexed_errors(...)
16. │ │ └─base::withCallingHandlers(...)
17. │ ├─purrr:::call_with_cleanup(...)
18. │ └─pecotmr (local) .f(.x[[i]], .y[[i]], ...)
19. │ └─base::stop("No common complete samples between genotype and phenotype/covariate data")
20. └─base::.handleSimpleError(...)
21. └─purrr (local) h(simpleError(msg, call))
22. └─cli::cli_abort(...)
23. └─rlang::abort(...)
There were 50 or more warnings (use warnings() to see the first 50)
This is due to in the new version, tabix_region no longer registered the colnames of the original dataframe as column names, which sets the tabix_table in the pheno list object as without the sample names.
An attempt to fix it is shown in the latest PR . But the fix is not complete. Because even if the colnames is properly set. Setting the colnames after the initial loading mess-up with the column data type of the output of the tabix_region.
Therefore, if the header = FALSE
in {fread(cmd = paste0("tabix -h ", file, " ", region), sep="auto", header = FALSE)},
is not necessary. It is best to remove it.
I can't recall why header was set to false but I take that you can always make it a parameter for tabix_region make it false by default?
I tried the proposed changes. As it turns out there are other changes in the load_phenotype_data that break the pipeline. After the tabix_region header are set to true, now the first data rows are set as the rownames instead of the header. I wont be able to work on it further until 5 though.
we can also talk in our 1:1 later today if it is easier. Perhaps we can leave the default to header auto. It's just important to allow this option to be set by the user now that multiple places we need this piece of code.