jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How does one load pilot data sets?

asfandyarazhar13 opened this issue · comments

Just wanted to ask how would one load in the pilot data sets (cpg0000-jump-pilot, cpg0001-cellpainting-protocol, and cpg0002-jump-scope)? How would the that differ from the provided sample notebook? Is there a setup in which the images are handled by a PyTorch data loader? Thanks in advance!

Hi @asfandyarazhar13,

For the pilot data sets, the images are in the relative location as that of cpg0016 but the profiles are not in .parquet format and the metadata is also in a different format. Please check our GitHub repo for cpg0000 for more details on how to interact with that dataset: https://github.com/jump-cellpainting/2021_Chandrasekaran_submitted. If you need any clarification, please let us know.

Is there a setup in which the images are handled by a PyTorch data loader?

Sorry, I am not aware of such a setup that I can share.

Thanks for the response/clarification @niranjchandrasekaran. Do you think you could also point me in the direction of the GitHub repos for cpg0001 and cpg0002? Would greatly appreciate it!

Hi @niranjchandrasekaran, wanted to follow up on this and ask about how I can find the associated metadata (plate/well: treatment applied to cell, cell type, well position, etc) of the images in cpg0001 and cpg0002. They repo for cpg0000 was quite helpful but I cannot find anything similar for the other two. Look forward to hearing from you soon!

Hi @asfandyarazhar13, we are currently reorganizing our repos for cpg0001 and cpg0002 such that the repos are consistent with the other datasets. In the meantime, if you are looking for info on what compounds were used in each experiment, then that will be JUMP-MOA (https://github.com/jump-cellpainting/JUMP-MOA) in cpg0002 and both JUMP-MOA and JUMP-Target-2 (https://github.com/jump-cellpainting/JUMP-Target).

Regarding cell type and other experimental conditions, since these pilots were designed for testing those conditions, they vary for each sub-experiment. You can find more details about them in https://www.biorxiv.org/content/10.1101/2022.07.13.499171v2 (cpg0001) and https://www.biorxiv.org/content/10.1101/2023.02.15.528711v1 (cpg0002).

Thanks @niranjchandrasekaran. I wanted to do some analysis using cpg0001 and cpg0002, however seems like that won't be possible for now since the repos are being reorganized. I guess I will have to wait for that right? I am finding it incredibly difficult to make a script that loads in images for cpg0001 and cpg0002 with their associated metadata features (the ones I mentioned above).

I believe you'd be reorganizing the repos for cpg0003 onwards as well then?

Hi @asfandyarazhar13, sorry to hear that you are having difficulty with the cpg0001 and cpg0002 images. Both these datasets have several sub-experiments, which can make them difficult to navigate. Is there anything particular you are having trouble with?

I believe you'd be reorganizing the repos for cpg0003 onwards as well then?

Most of the other datasets in the cellpaiting-gallery are old datasets (unrelated to JUMP) that were generated using pipelines that we were using then. It is possible that we will revisit them at some point in the future, but right now, we don't have plans to update/reorganize them.