2022_04_14_Lipocyte_Profiler (cpg0011)
htdashti opened this issue · comments
Segmentation/ Feature extraction is being performed by Cimini lab
Profile creation is being performed by Cimini lab
Data can be public in RODA Immediately
Update as generated:
Link to profile repo: https://github.com/broadinstitute/2019_08_03_Adipocyte_CellPainting_Claussnitzer
Link to publication: https://www.biorxiv.org/content/10.1101/2021.07.17.452050v1
cellpainting-gallery identifier: cpg0011-lipocyteprofiler
- Metadata completely filled out in Project Profiler Database (Imaging Platform internal use only)
- Segmentation/Feature extraction complete
- Profiling complete
Transfer to CellPainting Gallery:
- Upload data to RODA (is private by default)
- Run validation script to ensure completion
- Update cellpainting-gallery/README.md
- Make RODA entry public
If data is being published, prepare for publication:
- Run Distributed-BioFormats2Raw to create .ome.zarr files
- Upload (meta)data to IDR (images remain hosted in cellpainting-gallery).
Once published:
- Make IDR entry public
- Update cellpainting-gallery/README.md and open-data-registry/cellpainting-gallery.yml to reflect publication
- Move this Issue from cellpainting-gallery-private to cellpainting-gallery. This step can be performed at an earlier point if it needs inputs from an external collaborator.
Hesam's original top post is below
Hi Shantanu,
We are organizing our LipocyteProfiler data according to the outline. We populated the branches based on our available data, and I was wondering if the following information are sufficient for the cellpainting-gallery? if yes, how can we upload the data, and if no, what are the required information?
Thanks!
@htdashti in #3 and #1, we are still figuring out the exact process here. But that's going to take a while so let's proceed with a few edits
- @htdashti rename
LipocyteProfiler
tocpg0010-lipoprofiler
orcpg0011-lipocyteprofiler
(your call) - @htdashti rename
Broad-Inst
tobroad
- @htdashti point us to the location where we can access it on the Broad cluster and ensure we (any Broadie) can access it
- @htdashti LMK if this should be made public right away, in which case @shntnu update
public_prefixes
in https://github.com/jump-cellpainting/cellpainting-gallery-config/blob/main/cdk.json to addcpg0010-lipoprofiler
orcpg0011-lipocyteprofiler
to the list and recreate the stack. - @shntnu update
active_upload_prefixes
in https://github.com/jump-cellpainting/cellpainting-gallery-config/blob/main/cdk.json to addcpg0010-lipoprofiler
orcpg0011-lipocyteprofiler
to the list and recreate the stack - @shntnu ask Beth to tag someone who has access to the Broad cluster and can assume
jump-cp-role
, to sync the data to the location using the command below Instructions to sync - @htdashti Once I finish all steps, and assuming it is ok to make public right away, you are all set with the Instructions to add to your README (below)
Instructions to sync
To actually copy the data, remove --dryrun
after verifying that the dry runs look right
source=/web/ftp/incoming/hesam/ # replace this with the name of the folder that contains `cpg0011-lipocyteprofiler`
top_level=cpg0010-lipocyteprofiler
aws s3 sync \
--dryrun \
--profile jump-cp-role \
--acl bucket-owner-full-control \
--metadata-directive REPLACE \
${source}/${top_level}/ \
s3://cellpainting-gallery/${top_level}/
Instructions to add to your README
The data are available on an S3 bucket.
They can be downloaded at no cost and no need for registration of any sort, using the command:
aws s3 sync \
--no-sign-request \
s3://cellpainting-gallery/cpg0011-lipocyteprofiler/ .
AWS CLI installation instructions can be found here
Note: If you'd like to just browse the data, it's a lot easier to do so using a storage browser.
Thanks much @shntnu
I applied the changes, and you can find the data under "/web/ftp/incoming/hesam/cpg0011-lipocyteprofiler/".
Yes, please make these public as soon as possible
-bash:uger-c018:~ 1001 $ cd /web/ftp/incoming/hesam/
-bash:uger-c018:/web/ftp/incoming/hesam 1002 $ ls
cpg0011-lipocyteprofiler/
-bash:uger-c018:/web/ftp/incoming/hesam 1003 $ cd cpg0011-lipocyteprofiler/
-bash: cd: cpg0011-lipocyteprofiler/: Permission denied
@bethac07 Apologies! it should be fixed now.
@htdashti The metadata csvs are missing; barcode_platemap.csv is the only thing in each metadata file. Please do add these so we can upload all in one shot, thanks!
@bethac07 Thanks much! I think they should be in order now.
we are still figuring out the exact process here. But that's going to take a while so let's proceed with a few edits
I note that @ErinWeisbart has now created the process we will follow, which is for someone in the Cimini lab (in this case) to do this (private repo):
@htdashti it looks like when you added the new files, somehow permissions were reset?
-bash:login02:~ 1001 $ cd /web/ftp/incoming/hesam/cpg0011-lipocyteprofiler/
-bash:login02:/web/ftp/incoming/hesam/cpg0011-lipocyteprofiler 1007 $ ls broad/workspace/metadata/
ls: cannot open directory broad/workspace/metadata/: Permission denied
-bash:login02:/web/ftp/incoming/hesam/cpg0011-lipocyteprofiler 1008 $ cd broad/workspace/metadata/
-bash: cd: broad/workspace/metadata/: Permission denied
Apologies! not sure whats happened, but I updated the permissions again.
Upload underway!
Sync completed over the long weekend, re-running now just to make sure nothing got missed, but I think we're all set!
Great! Thank you very much!
Can you please add this to a future version of the paper
All components of Cell Painting are available at the Cell Painting Gallery on the Registry of Open Data on AWS (https://registry.opendata.aws/cellpainting-gallery/) under accession number cpg0011.
@htdashti Someone who read the preprint reached out to me asking for the pipelines - is it ok to share the ones for the batches present in this paper? It would be great if we can just add them all straight here in the gallery.