jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarify why some compounds have multiple replicates

ChenyuWang-Monica opened this issue · comments

When I'm counting the replicates of each compound in the COMPOUND plates, I have a few questions:

  1. The top ten compounds have >6000 replicates. Among them are DMSO, the empty well (JCP2022_999999), and 8 positive controls. However, when I compare the InChIKey of the 8 positive controls with those given in https://github.com/jump-cellpainting/JUMP-Target/tree/master#positive-control-compounds, one of them disagrees: JCP2022_025848 (GJFCONYVAUNLKB-UHFFFAOYSA-N) has 8127 replicates but is not listed as a positive control; dexamethasone (UREBDLICKHMUKA-CXSFZGCWSA-N) listed as a positive control doesn't appear in the metadata compound.csv.gz.

  2. The 11th-ranked compound JCP2022_033954 has 1594 replicates. Is it also a positive control or what is it aiming for?

  3. There are many compounds with multiple replicates (for example over 10 but less than 60). Why do they have much more replicates than the common case as mentioned in the paper (i.e. about 5)?

Thanks!

Hi @ChenyuWang-Monica, my answers are below

The top ten compounds have >6000 replicates. Among them are DMSO, the empty well (JCP2022_999999), and 8 positive controls. However, when I compare the InChIKey of the 8 positive controls with those given in https://github.com/jump-cellpainting/JUMP-Target/tree/master#positive-control-compounds, one of them disagrees: JCP2022_025848 (GJFCONYVAUNLKB-UHFFFAOYSA-N) has 8127 replicates but is not listed as a positive control; dexamethasone (UREBDLICKHMUKA-CXSFZGCWSA-N) listed as a positive control doesn't appear in the metadata compound.csv.gz.

We have been having some issues with matching InChIKeys between what we previously released in the JUMP-Target repo and what we released in this repo. But I can confirm that JCP2022_025848 is dexamethasone. The mapping between JCP2022 IDs and compound names are below.

Metadata_JCP2022 Metadata_InChIKey poscon_pert_iname JUMP_Target_InChIKey
JCP2022_085227 SRVFFFJZQVENJC-UHFFFAOYSA-N aloxistatin SRVFFFJZQVENJC-IHRRRGAJSA-N
JCP2022_037716 IVUGFMLRJOCGAS-UHFFFAOYSA-N AMG900 IVUGFMLRJOCGAS-UHFFFAOYSA-N
JCP2022_025848 GJFCONYVAUNLKB-UHFFFAOYSA-N dexamethasone UREBDLICKHMUKA-CXSFZGCWSA-N
JCP2022_046054 KPBNHDGDUADAGP-UHFFFAOYSA-N FK-866 KPBNHDGDUADAGP-VAWYXSNFSA-N
JCP2022_035095 IHLVSLOZUHKNMQ-UHFFFAOYSA-N LY2109761 IHLVSLOZUHKNMQ-UHFFFAOYSA-N
JCP2022_064022 OINGHOPGNMYCAB-UHFFFAOYSA-N NVS-PAK1-1 OINGHOPGNMYCAB-INIZCTEOSA-N
JCP2022_050797 LOUPRKONTZGTKE-UHFFFAOYSA-N quinidine LOUPRKONTZGTKE-LHHVKLHASA-N
JCP2022_012818 CQKBSRPVZZLCJE-UHFFFAOYSA-N TC-S-7004 CQKBSRPVZZLCJE-UHFFFAOYSA-N

The 11th-ranked compound JCP2022_033954 has 1594 replicates. Is it also a positive control or what is it aiming for?

Thanks for bringing this to our attention. I believe this is a metadata issue. Most of these wells come from a single source (source_9) and all the wells are in columns 1, 24, 25 or 48. @shntnu you had noticed the number of replicates in #30 (comment), but I don't know whether we flagged this as a metadata error or not.

There are many compounds with multiple replicates (for example over 10 but less than 60). Why do they have much more replicates than the common case as mentioned in the paper (i.e. about 5)?

In general, most compounds should have five replicates, but there are some exceptions and I have listed some of them below.

  • There is an overlap of compounds between source 7, who did not exchange compounds with the other sources, and the other sources. These compounds will have more than 5 replicates.
  • Around 2000 compounds are common between wave 1 and wave 2 sources (you can find more information about the two waves of sources in the manuscript). These compounds will also have more than 5 replicates.
  • One of the sources needed to have multiple replicates of the compounds that they nominated. When these compounds were exchanged with other sources, we ended up with more than 5 replicates of these compounds.

Thanks for bringing this to our attention. I believe this is a metadata issue. Most of these wells come from a single source (source_9) and all the wells are in columns 1, 24, 25 or 48. @shntnu you had noticed the number of replicates in #30 (comment), but I don't know whether we flagged this as a metadata error or not.

Indeed – not sure why this was the case. I'll follow up in that internal issue and loop back here