Low AUPR values in analyses

Question

Low AUPR values in analyses

APuchkina opened this issue 2 months ago · comments

Dear Saeyslab team,

Thank you for creating the great package!

I have been following the vignettes for analysis of scRNA-seq data of mouse embryonic tissues. I am mostly interested in differentiation processes during development, so doing the analyses based on DE between different (progenitor) cell types.
However I have noticed that the AUPR values in my analyses are 4-10 fold lower compared to the vignette. So my question is can these values still be considered meaningful?
I do find some known ligand-target interactions during the analyses, which is reassuring.

Thanks in advance!

Arina

Daniele Pittari · Answer 1 · Fri Aug 02 2024 01:15:40 GMT+0800 (China Standard Time)

Dear user,
we observed that nominal AUPR values can vary across different datasets. This is why establishing a hard cutoff is not a procedure we suggest. If the known ligand-target interactions are identified among the AUPR top-ranked ones, this could be considered an encouraging finding.

Bests,
Daniele

APuchkina · Answer 2 · Wed Aug 07 2024 15:18:46 GMT+0800 (China Standard Time)

Dear Daniele,

Thanks for the fast reply!

APuchkina · Answer 3 · Tue Sep 03 2024 22:04:32 GMT+0800 (China Standard Time)

Dear Saeyslab team,

I had an additional question regarding the AUPR values. I was trying to understand how these are being calculated exactly, but could not fully figure it out (might be my lack of knowledge on statistics).

The vignette says: Ligands are ranked based on the area under the precision-recall curve (AUPR) between a ligand’s target predictions and the observed transcriptional response.

As far as I understand for the making of a PR curve you need to calculate the true positives as well as false positives and false negatives, in the end allowing you to calculate both precision and recall. Do I then understand correctly that the model predictions for each ligand and its targets are compared to the observed expression of the ligand and the targets, the latter taken as ground truth?

The reason I am asking this, is that the ligands with the highest AUPR, don't always have the highest regulatory potential or the most downstream target genes in the heatmap generated with nichenet_output$ligand_target_heatmap. So would my interpretation then be correct that the ligands with the highest AUPR are the ones for which the model predicts the data the best, and the heatmap just shows what the regulatory potential of this ligand is within the model?
And would it then make sense to first pick the top n genes based on the AUPR, and then 'filter' further based on the regulatory potential in the heatmap? Since it is not really feasible to test 40 ligands experimentally..
Or would it then be better to just select a lower n for the top ligands based on the AUPR and look into those?

Thanks in advance!

Arina

ldblack · Answer 4 · Fri Sep 06 2024 10:09:31 GMT+0800 (China Standard Time)

I'm also encountering problems with low AUPR (my highest uncorrected value is 0.087 and lowest is 0.059). I also have super low AUROC (highest is 0.4 and lowest is 0.32) and pearson correlation coefficients (highest 0.025, lowest -0.197). I read in the FAQ that this could be because of a low number of background genes, but I have 7,683 background genes and 184 genes in my geneset_oi. I am following the step by step tutorial exactly except I changed the way DE_table_receiver was calculated slightly so that I used MAST and only included genes with a positive lfc:

DE_table_receiver <- FindMarkers(object = seurat_obj_receiver,
ident.1 = condition_oi, ident.2 = condition_reference,
test.use = "MAST", logfc.threshold = 0, only.pos = TRUE,
min.pct = 0.05) %>% rownames_to_column("gene")

geneset_oi <- DE_table_receiver %>% filter(p_val_adj <= 0.001 & avg_log2FC > .5) %>% pull(gene)

Do you think these values are because the model just isn't a very good fit or is the fact that they are often a lot lower than chance levels suggestive of some other problem?