Some images seem to be missing in the first batches of source 4 (cpg0016).
Arkkienkeli opened this issue · comments
Hello, some images might be missing in the first batches.
Example:
aws s3 cp --no-sign-request s3://cellpainting-gallery/cpg0016-jump/source_4/images/2021_06_07_Batch5/images/BR00123962__2021-06-17T04_57_17-Measurement1/Images/r16c04f01p01-ch4sk1fk1fl1.tiff r16c04f01p01-ch4sk1fk1fl1.tiff
fatal error: An error occurred (404) when calling the HeadObject operation: Key "cpg0016jump/source_4/images/2021_06_07_Batch5/images/BR00123962__2
021-06-17T04_57_17-Measurement1/Images/r16c04f01p01-ch4sk1fk1fl1.tiff" does not exist
This file is enlisted though in load_data_with_illum.parquet
for this plate.
Could you please check that images are in place? Especially for the first five batches.
Unfortunately I don't know how to check all those examples quickly source-wide.
cc @shntnu
According to my checks, batches 6-13 (except 12 which we don't analyze) are ok.
In the batch 5 only the image r16c04f01p01-ch4sk1fk1fl1.tiff
seems to be missing.
I don't know about batches 1-4.
Okay, looks like it works, AWS likes to skip files sometimes, used wrong link when tried to reproduce.
@Arkkienkeli Glad this got sorted out.
AWS likes to skip files sometimes
Can you please clarify? The information could be beneficial for others
@Arkkienkeli Glad this got sorted out.
AWS likes to skip files sometimes
Can you please clarify? The information could be beneficial for others
Sure. When I did aws s3 cp
to local machine for the whole batch, usually during checking ( if file exists for each image from the metadata) I could observe a few files missing. In that case I had to download those images again. Usually 1 to 5 images are skipped after first download attempt, but not always.
Usually 1 to 5 images are skipped after first download attempt, but not always.
Ah ok. I strongly recommend using aws s3 sync
for this reason