morinlab / GAMBLR.data

Collection of Curated Data for Genomic Analysis of Mature B-cell Lymphomas in R

Home Page:https://morinlab.github.io/GAMBLR.data/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discrepancies in "Lymph Gen" column names between the two bundled metadata objects.

mattssca opened this issue · comments

In the metadata, bundled in sample_data the lymph gen column is called "LymphGen" whereas in the all-so-familiar output from get_gambl_metadata and the bundled metadata in this package, refer to the same column as "lymphgen".

This discrepancy is causing errors if a user access the metadata in the list (sample_data) and pipes it to other GAMBL functions as a regular metadata object.

> any(names(get_gambl_metadata()) == 'LymphGen')
[1] FALSE                                                                                                                                                                                                          
> any(names(get_gambl_metadata()) == 'lymphgen')
[1] TRUE                                                                                                                                                                                                           
> any(names((GAMBLR.data::sample_data$meta)) == 'LymphGen')
[1] TRUE
> any(names((GAMBLR.data::sample_data$meta)) == 'lymphgen')
[1] FALSE
#get data
dohh2_maf = GAMBLR.data::sample_data$grch37$maf %>% dplyr::filter(Tumor_Sample_Barcode == "DOHH-2")
dohh2_meta = GAMBLR.data::sample_data$meta %>% dplyr::filter(sample_id == "DOHH-2")

#build plot
ashm_rainbow_plot(this_maf = dohh2_maf, metadata = dohh2_meta, region = "chr6:90975034-91066134")
                                  
Error in `$<-.data.frame`(`*tmp*`, classification, value = integer(0)) : replacement has 0 rows, data has 1

Has this issue been resolved?

No, but I can definitely address this today 🙂

Ok, awesome! 😎

The column Sex is also discrepant (capitalised in one and all lower case in other) - so has to be addressed too 👍

Should be fixed now, will push this to PR

> colnames(sample_data$meta)
 [1] "patient_id"           "sample_id"            "Tumor_Sample_Barcode"
 [4] "seq_type"             "sex"                  "COO_consensus"       
 [7] "lymphgen"             "genetic_subgroup"     "EBV_status_inf"      
[10] "cohort"               "pathology"            "reference_PMID"      
> colnames(sample_data$meta)[!colnames(sample_data$meta) %in% colnames(GAMBLR.data::gambl_metadata)]
[1] "genetic_subgroup" "reference_PMID" 

This is now fixed in #30