mims-harvard / TDC

Therapeutics Commons: Artificial Intelligence Foundation for Therapeutic Science

Home Page:https://tdcommons.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug in loading scPerturb datasets

abearab opened this issue · comments

Hi @kexinhuang12345, as you know ReplogleWeissman2022 study has three datasets.

image

Currently, as I understand ReplogleWeissman2022_K562_gwps data is not uploaded. However, I noticed a weird behavior when I tried to load it! I had ReplogleWeissman2022_k562_essential already downloaded in a path folder and then I tried loading scperturb_gene_ReplogleWeissman2022_K562_gwps and noticed it's saying Found local copy...!

>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets')
Found local copy...
Loading...

Looking at the # of perturbations, it's not true for _gwps dataset. It should be 9867 but it's 2058 (this is the same number as _essential dataset)

>>> test_load.adata.obs.perturbation.unique()

Length: 2058

Looking more carefully, I tried an empty folder and noticed for some reason this is downloading wrong file for _gwps.

>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets/new/')
Downloading...
█████████████████████████████████████████████| 1.55G/1.55G [01:09<00:00, 22.2MiB/s]
Loading...
~: ls Datasets/new/

scperturb_gene_ReplogleWeissman2022_k562_essential.h5ad

cc @amva13

Originally posted by @abearab in #239 (comment)

@kexinhuang12345 – hi Kexin, I was wondering if you could check this issue. Thanks

Hi! Sorry for the delay - I think it is due to some name catching bugs, currently we do not have the gwps version uploaded to dataverse. Will fix it after the NeurIPS deadline!

Hi! Sorry for the delay - I think it is due to some name catching bugs, currently we do not have the gwps version uploaded to dataverse.

I see, that makes sense.

Will fix it after the NeurIPS deadline!

Thanks!