broadinstitute / cellpainting-gallery

Cell Painting Gallery

Home Page:https://broadinstitute.github.io/cellpainting-gallery/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

2022_04_14_Lipocyte_Profiler (cpg0011)

htdashti opened this issue · comments

Segmentation/ Feature extraction is being performed by Cimini lab
Profile creation is being performed by Cimini lab
Data can be public in RODA Immediately

Update as generated:

Link to profile repo: https://github.com/broadinstitute/2019_08_03_Adipocyte_CellPainting_Claussnitzer
Link to publication: https://www.biorxiv.org/content/10.1101/2021.07.17.452050v1
cellpainting-gallery identifier: cpg0011-lipocyteprofiler

  • Metadata completely filled out in Project Profiler Database (Imaging Platform internal use only)
  • Segmentation/Feature extraction complete
  • Profiling complete

Transfer to CellPainting Gallery:

  • Upload data to RODA (is private by default)
  • Run validation script to ensure completion
  • Update cellpainting-gallery/README.md
  • Make RODA entry public

If data is being published, prepare for publication:

  • Run Distributed-BioFormats2Raw to create .ome.zarr files
  • Upload (meta)data to IDR (images remain hosted in cellpainting-gallery).

Once published:

  • Make IDR entry public
  • Update cellpainting-gallery/README.md and open-data-registry/cellpainting-gallery.yml to reflect publication
  • Move this Issue from cellpainting-gallery-private to cellpainting-gallery. This step can be performed at an earlier point if it needs inputs from an external collaborator.

Hesam's original top post is below

Hi Shantanu,
We are organizing our LipocyteProfiler data according to the outline. We populated the branches based on our available data, and I was wondering if the following information are sufficient for the cellpainting-gallery? if yes, how can we upload the data, and if no, what are the required information?
Screen Shot 2022-04-14 at 1 23 41 PM
Thanks!

@htdashti in #3 and #1, we are still figuring out the exact process here. But that's going to take a while so let's proceed with a few edits

Instructions to sync

To actually copy the data, remove --dryrun after verifying that the dry runs look right

source=/web/ftp/incoming/hesam/ # replace this with the name of the folder that contains `cpg0011-lipocyteprofiler`
top_level=cpg0010-lipocyteprofiler

aws s3 sync \
  --dryrun \
  --profile jump-cp-role \
  --acl bucket-owner-full-control \
  --metadata-directive REPLACE  \
  ${source}/${top_level}/ \
  s3://cellpainting-gallery/${top_level}/  

Instructions to add to your README

The data are available on an S3 bucket.
They can be downloaded at no cost and no need for registration of any sort, using the command:

aws s3 sync \
  --no-sign-request \
  s3://cellpainting-gallery/cpg0011-lipocyteprofiler/ . 

AWS CLI installation instructions can be found here

Note: If you'd like to just browse the data, it's a lot easier to do so using a storage browser.

Thanks much @shntnu
I applied the changes, and you can find the data under "/web/ftp/incoming/hesam/cpg0011-lipocyteprofiler/".
Yes, please make these public as soon as possible

@htdashti

-bash:uger-c018:~ 1001 $ cd /web/ftp/incoming/hesam/
-bash:uger-c018:/web/ftp/incoming/hesam 1002 $ ls
cpg0011-lipocyteprofiler/
-bash:uger-c018:/web/ftp/incoming/hesam 1003 $ cd cpg0011-lipocyteprofiler/
-bash: cd: cpg0011-lipocyteprofiler/: Permission denied

@bethac07 Apologies! it should be fixed now.

@htdashti The metadata csvs are missing; barcode_platemap.csv is the only thing in each metadata file. Please do add these so we can upload all in one shot, thanks!

@bethac07 Thanks much! I think they should be in order now.

we are still figuring out the exact process here. But that's going to take a while so let's proceed with a few edits

I note that @ErinWeisbart has now created the process we will follow, which is for someone in the Cimini lab (in this case) to do this (private repo):

https://github.com/broadinstitute/cellpainting-gallery-private/issues/new?assignees=&labels=&template=data-immediately-public.md&title=YYYY_MM_DD_Dataset_Name

@htdashti it looks like when you added the new files, somehow permissions were reset?

-bash:login02:~ 1001 $ cd /web/ftp/incoming/hesam/cpg0011-lipocyteprofiler/
-bash:login02:/web/ftp/incoming/hesam/cpg0011-lipocyteprofiler 1007 $ ls broad/workspace/metadata/
ls: cannot open directory broad/workspace/metadata/: Permission denied
-bash:login02:/web/ftp/incoming/hesam/cpg0011-lipocyteprofiler 1008 $ cd broad/workspace/metadata/
-bash: cd: broad/workspace/metadata/: Permission denied

Apologies! not sure whats happened, but I updated the permissions again.

Upload underway!

Sync completed over the long weekend, re-running now just to make sure nothing got missed, but I think we're all set!

Great! Thank you very much!

@htdashti

Can you please add this to a future version of the paper

All components of Cell Painting are available at the Cell Painting Gallery on the Registry of Open Data on AWS (https://registry.opendata.aws/cellpainting-gallery/) under accession number cpg0011.

@htdashti Someone who read the preprint reached out to me asking for the pipelines - is it ok to share the ones for the batches present in this paper? It would be great if we can just add them all straight here in the gallery.

@shntnu Thanks for the remainder. It is in the resubmitted materials.
@bethac07 Absolutely, thank you! It is a great suggestion, if you dont mind please add them to the gallery. please let me know if I can help with this