jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some images seem to be missing in the first batches of source 4 (cpg0016).

Arkkienkeli opened this issue · comments

Hello, some images might be missing in the first batches.

Example:

aws s3 cp --no-sign-request s3://cellpainting-gallery/cpg0016-jump/source_4/images/2021_06_07_Batch5/images/BR00123962__2021-06-17T04_57_17-Measurement1/Images/r16c04f01p01-ch4sk1fk1fl1.tiff r16c04f01p01-ch4sk1fk1fl1.tiff
fatal error: An error occurred (404) when calling the HeadObject operation: Key "cpg0016jump/source_4/images/2021_06_07_Batch5/images/BR00123962__2
021-06-17T04_57_17-Measurement1/Images/r16c04f01p01-ch4sk1fk1fl1.tiff" does not exist

This file is enlisted though in load_data_with_illum.parquet for this plate.

Could you please check that images are in place? Especially for the first five batches.
Unfortunately I don't know how to check all those examples quickly source-wide.

According to my checks, batches 6-13 (except 12 which we don't analyze) are ok.
In the batch 5 only the image r16c04f01p01-ch4sk1fk1fl1.tiff seems to be missing.
I don't know about batches 1-4.

Okay, looks like it works, AWS likes to skip files sometimes, used wrong link when tried to reproduce.

@Arkkienkeli Glad this got sorted out.

AWS likes to skip files sometimes

Can you please clarify? The information could be beneficial for others

@Arkkienkeli Glad this got sorted out.

AWS likes to skip files sometimes

Can you please clarify? The information could be beneficial for others

Sure. When I did aws s3 cp to local machine for the whole batch, usually during checking ( if file exists for each image from the metadata) I could observe a few files missing. In that case I had to download those images again. Usually 1 to 5 images are skipped after first download attempt, but not always.

Usually 1 to 5 images are skipped after first download attempt, but not always.

Ah ok. I strongly recommend using aws s3 sync for this reason