New changes in tabix_region break existing codes

Question

New changes in tabix_region break existing codes

hsun3163 opened this issue 6 months ago · comments

The new commit c146a06 break the existing prepare_data_list function with the following error:

Caused by error in `.f()`:
! No common complete samples between genotype and phenotype/covariate data
Backtrace:
     ▆
  1. ├─pecotmr::load_regional_functional_data(...)
  2. │ └─pecotmr:::load_regional_association_data(...)
  3. │   └─pecotmr:::prepare_data_list(...)
  4. │     └─... %>% ...
  5. ├─dplyr::select(...)
  6. ├─dplyr::mutate(...)
  7. ├─dplyr:::mutate.data.frame(...)
  8. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  9. │   ├─base::withCallingHandlers(...)
 10. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
 11. │     └─mask$eval_all_mutate(quo)
 12. │       └─dplyr (local) eval()
 13. ├─purrr::map2(...)
 14. │ └─purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
 15. │   ├─purrr:::with_indexed_errors(...)
 16. │   │ └─base::withCallingHandlers(...)
 17. │   ├─purrr:::call_with_cleanup(...)
 18. │   └─pecotmr (local) .f(.x[[i]], .y[[i]], ...)
 19. │     └─base::stop("No common complete samples between genotype and phenotype/covariate data")
 20. └─base::.handleSimpleError(...)
 21.   └─purrr (local) h(simpleError(msg, call))
 22.     └─cli::cli_abort(...)
 23.       └─rlang::abort(...)
There were 50 or more warnings (use warnings() to see the first 50)

This is due to in the new version, tabix_region no longer registered the colnames of the original dataframe as column names, which sets the tabix_table in the pheno list object as without the sample names.

An attempt to fix it is shown in the latest PR . But the fix is not complete. Because even if the colnames is properly set. Setting the colnames after the initial loading mess-up with the column data type of the output of the tabix_region.

Therefore, if the header = FALSE in {fread(cmd = paste0("tabix -h ", file, " ", region), sep="auto", header = FALSE)}, is not necessary. It is best to remove it.

gaow · Answer 1 · Mon Jan 29 2024 20:11:54 GMT+0800 (China Standard Time)

I can't recall why header was set to false but I take that you can always make it a parameter for tabix_region make it false by default?

hsun3163 · Answer 2 · Mon Jan 29 2024 23:41:05 GMT+0800 (China Standard Time)

I tried the proposed changes. As it turns out there are other changes in the load_phenotype_data that break the pipeline. After the tabix_region header are set to true, now the first data rows are set as the rownames instead of the header. I wont be able to work on it further until 5 though.

gaow · Answer 3 · Tue Jan 30 2024 01:02:17 GMT+0800 (China Standard Time)

we can also talk in our 1:1 later today if it is easier. Perhaps we can leave the default to header auto. It's just important to allow this option to be set by the user now that multiple places we need this piece of code.