buenoalvezm / Pan-cancer-profiling

This repository contains the code to reproduce our exploratory analysis of a pan-cancer plasma proteomics dataset, including differential expression analysis and disease classification (https://doi.org/10.1038/s41467-023-39765-y). The data can be explored in the Human Protein Atlas: www.proteinatlas.org/humanproteome/disease.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing metadata

TimoLassmann opened this issue · comments

Hi,

Great study and thank you for providing the data and code to replicate the results in your study. Unfortunately, I seem to be unable to find the meta-data for the data file on BioStudies (S-BSST935). The data file contains unique IDs for each patient, but there is no clear way of associating these with the metadata in the Source data / supplementary excel sheet (tab 1a).

It would be great if you could provide a table linking the metadata unambiguously to the entries in S-BSST935.

Thanks, T

Hi,

Thank you for your interest in our study. 
The ethical approval allows us to show the individual protein levels for the different diseases, but not to link it to clinical metadata. The data on protein levels are provided as an open-access resource without restrictions, we hope it can contribute to your research.

Best regards, María

Thank you for your response, and I should have been more specific with my query.

Supplementary table 2 contains the following entries:
Cancer Sex Age Stage Grade
AML Male (30-40) 3
AML Male (30-40) 1
etc...

The data file "pancancer_olink_data_biostudies_v2.txt" contains the following entries:

"Sample_ID" "Cancer" "Assay" "OlinkID" "UniProt" "Panel" "NPX"
"AML_1" "AML" "AARSD1" "OID21311" "Q9BTE6" "Oncology" 5.01745
etc...

How can I link AML_1 to a row in the supplementary table? Can I assume that the first row in sup table 2 is AML_1 and so forth? It would be great to link these data via a unique identifier to remove ambiguities.

Dear Timo,

Thank you for your detailed inquiry and for your continued interest in our study. We appreciate your thorough examination of the data, but unfortunately, we are unable to provide a direct link between the individual entries in Supplementary Table 2 and the data file "pancancer_olink_data_biostudies_v2.txt" due to the restrictions in place.

Given the limitations, I will be closing this GitHub issue. If you have any other questions or need further assistance, please don't hesitate to reach out.

Best regards,
María

Dear Maria,

Without this 'linking' data it seems to me it is entirely impossible to reproduce any of the results presented in the paper. Essentially you are providing two pieces of data which, in isolation, are completely useless without the critical link between them.

Please let me know if my understanding is correct.

Thanks, T