stan-dev / loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)

Home Page:https://mc-stan.org/loo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Report ESS 0 if khat too large

avehtari opened this issue · comments

Example output from https://discourse.mc-stan.org/t/understanding-loo-results-for-non-standard-contaminated-controls-model/25493

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     717   73.2%   151       
 (0.5, 0.7]   (ok)       133   13.6%   116       
   (0.7, 1]   (bad)       74    7.6%   38        
   (1, Inf)   (very bad)  56    5.7%   839       
See help('pareto-k-diagnostic') for details.

where the very bad category shows quite high effective sample size. PSIS paper demonstrates that the effective sample size estimate is unreliable when k>0.7. We are already reporting for MCSE NA if any khat>0.7, so I propose we report effective sample size 0 in these categories. This would reduce possible confusion.

Just report 0, or also store in loo object $diagnostics$ess value 0?

Why prefer ESS of 0 over NaN?

I agree with Seth, maybe it is more defensive to return NaN. Or is there a specific reason to return 0 here?

Why NaN instead of NA? Currently if any k>0.7, we report MCSE as NA

I don't have a strong preference for 0, but I would assume 0 would produce fewer questions. But as we report NA for MCSE, I'm with NA, too

Oh. I have no strong feelings on NA vs NaN. If NA is better I think that is good, but I think you said that MCSE returned NaN? I just think NA/NaN is better than 0 since it will cast a more explicit error and hence be more defensive. People would not accidentally use the value.

Maybe we can throw a warning when returning NA to avoid questions?

but I think you said that MCSE returned NaN?

Where? In the above comment, I write "Currently if any k>0.7, we report MCSE as NA"

NA seems better, since IIUC, it indicates the value is unknown or unavailable.

I just think NA/NaN is better than 0 since it will cast a more explicit error and hence be more defensive. People would not accidentally use the value.

NA (or NaN) is not an error. I'm happy if they use ESS 0, when we know the efficiency is very close to 0.

Maybe we can throw a warning when returning NA to avoid questions?

I think we already have too many warnings thrown that are annoying after the first time. I'm fine adding an explanation in the output if there are k>0.7 as we have planned to do anyway.

Sorry. I see ” We are already reporting for MCSE NaN if any khat>0.7, so I propose we report…” in the comment above?

I meant that if people would use the n_eff they would explicitly need to set its value to 0 for k>0.7. Hence it would need an active decision to use it. The current recommendations is to not use the estimates of elpd_i in the first place if k>0.7?

I agree, a warning is not good. I think the current documentation is quite clear. Maybe add a sentance there that the NA can be replaced with zero under the paragraph of effective sample size? If we go with a defensive solution.

Although, this is no big thing for me so just decide what you think is best. Its just me that have a very defensive coding style.

ps. When reading I realized that a ref in the docs is incorrect. I guess Vehtari et al (2017b) should be Vehtari et al (2019).

Sorry. I see ” We are already reporting for MCSE NaN if any khat>0.7, so I propose we report…” in the comment above?

Oh, in the first post! I fixed that. The actual printed output is NA.

I meant that if people would use the n_eff

btw. we're switching to use ESS instead of n_eff #192

they would explicitly need to set its value to 0 for k>0.7.

Based on the questions about the current NA, many novice users don't know the meaning of NA. For ESS, 0 might be more informative.

The current recommendations is to not use the estimates of elpd_i in the first place if k>0.7?

The current recommendation is not to use it, if someone's life depend on it (or something similar). In early phases of the workflow you may use it with caution. Of course in the output we need to opt for a short message, and then provide the more cautious explanation.

  • 0: Pros: might be easier to interpret in comparison of other ESS values (ie 0 is smaller than any positive value). Pro or cons: if used to estimate MCSE produces Inf, while we're now reporting NA.
  • NA: Pros: would follow the current use of NA for MCSE, would mean more of "we don't know" than certain 0.

I think this is a good summary. I would add that using NA would probably nudge users to read the Pareteto-k-diagnostic docs with higher probability / be more defensive.

I still lean more toward NA, but I have no strong opinion here.

I now fixed the refs in PR #194 as well.