DearCaat / RRT-MIL

[CVPR 2024] Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Labels for cancer subtyping

JackyC666 opened this issue · comments

Hi, How did you label the BRCA dataset? In your paper,

TCGA-BRCA includes two sub-types of cancers, Inva-
sive Ductal Carcinoma (IDC) and Invasive Lobular Carci-
noma (ILC). There are 779 IDC slides and 198 ILC slides.

So, HOW can I REPEAT it?
In the TCGA DATASET,I can`t found something about it!
Thanks~~

I got these labels from the clinical.json file obtained from the GDC website, specifically:

  • I first downloaded the clinical.json file under Diagnostic Slide of TCGA-BRCA project from the GDC official website.
  • Second, I obtained the case_id and primary_diagnosis fields for each case from that file.
  • Finally, I classified the primary_diagnosis field containing the IDC or ILC subtype keyword as either IDC or ILC. Thus, I got the BRCA labels that I am using now.

Since I haven't found any other way to process this either, I'm not sure if this is entirely appropriate, so if you have a better idea, feel free to leave a comment.

I got these labels from the clinical.json file obtained from the GDC website, specifically:

  • I first downloaded the clinical.json file under Diagnostic Slide of TCGA-BRCA project from the GDC official website.
  • Second, I obtained the case_id and primary_diagnosis fields for each case from that file.
  • Finally, I classified the primary_diagnosis field containing the IDC or ILC subtype keyword as either IDC or ILC. Thus, I got the BRCA labels that I am using now.

Since I haven't found any other way to process this either, I'm not sure if this is entirely appropriate, so if you have a better idea, feel free to leave a comment.

Thanks for your reply!
where is the primary_diagnosis?
Can you give me a link?
Or a screenshot to illustrate?
Sincerely thank you for your help!

Sorry, it's a file named clinical.json, and it's also downloaded on GDC site. The praimary_diagnosis is one field of this json file, like this:
image