Divide by zero when calling `bootstrap_crossval` using 1 model and 1 rdm

Question

Divide by zero when calling `bootstrap_crossval` using 1 model and 1 rdm

Aceticia opened this issue 9 months ago · comments

Describe the bug
When using only 1 model and 1 rdm for bootstrap_crossval, I run into division by zero errors. I do this because I'm running a searchlight and it's computationally more realistic to separate my first level and second level analysis.

To Reproduce

from rsatoolbox.rdm import RDMs
from rsatoolbox.model import ModelWeighted
from rsatoolbox.inference import bootstrap_crossval
model = ModelWeighted('m', np.random.rand(5, 10))
data = RDMs(np.random.rand(10))
bootstrap_crossval(model, data, boot_type="pattern")

Expected behavior
No error and obtain the result object.

Versions
rsatoolbox==0.1.4
Python 3.10.12

Additional context
The n/(n-1) in this function causes the issue. It's called when constructing Result object in bootstrap_crossval. I think this is easily fixable by just returning np.nan when n=1.

def _correct_1d(
        variance: NDArray,
        n_pattern: Optional[int] = None,
        n_rdm: Optional[int] = None):
    if (n_pattern is not None) and (n_rdm is not None):
        # uncorrected dual bootstrap?
        n = min(n_rdm, n_pattern)
    elif n_pattern is not None:
        n = n_pattern
    elif n_rdm is not None:
        n = n_rdm
    else:
        n = None
    if n is not None:
        variance = (n / (n - 1)) * variance
    return variance

stack trace
I replaced my username with XXX here, everything else is as is.

/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/numpy/lib/function_base.py:520: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis, **keepdims_kw)
/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/numpy/core/_methods.py:121: RuntimeWarning: invalid value encountered in divide
  ret = um.true_divide(
/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/rsatoolbox/inference/evaluate.py:582: RuntimeWarning: Degrees of freedom <= 0 for slice
  var_mean = np.cov(
/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/numpy/lib/function_base.py:2748: RuntimeWarning: divide by zero encountered in divide
  c *= np.true_divide(1, fact)
/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/numpy/lib/function_base.py:2748: RuntimeWarning: invalid value encountered in multiply
  c *= np.true_divide(1, fact)
/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/rsatoolbox/inference/evaluate.py:586: RuntimeWarning: Degrees of freedom <= 0 for slice
  var_1.append(np.cov(np.concatenate([
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/rsatoolbox/inference/evaluate.py", line 599, in bootstrap_crossval
    result = Result(models, evaluations, method=method,
  File "/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/rsatoolbox/inference/result.py", line 68, in __init__
    extract_variances(variances, nc_included, n_rdm, n_pattern)
  File "/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/rsatoolbox/util/inference_util.py", line 575, in extract_variances
    model_variances = _correct_1d(model_variances, n_pattern, n_rdm)
  File "/Users/XXX/miniconda3/envs/nature2023/lib/python3.10/site-packages/rsatoolbox/util/inference_util.py", line 617, in _correct_1d
    variance = (n / (n - 1)) * variance
ZeroDivisionError: division by zero

Heiko Schütt · Answer 1 · Fri Nov 10 2023 01:38:37 GMT+0800 (China Standard Time)

First of all welcome! And thank you for reporting this bug!

I had a look and fixed this bug, I believe.
The bootstrap_crossval function passed the numbers through, without catching the case when we do not bootstrap over those dimensions. This should now be fixed.

The fix you originally proposed of just returning nan would not work I am afraid as the main purpose of the bootstrap is to return the variances of model evaluations. If you do not need that, you should probably fall back to a single crossvalidation and save a lot of computation time.

Xujin Chris Liu · Answer 2 · Sun Nov 12 2023 11:43:14 GMT+0800 (China Standard Time)

Thank you for the fast fix! Although I might need the variance later, I indeed only need the CV scores for now. To do the CV, I only found internal CV functions that are I'm not sure about how to use. Is there something more using facing that I missed?

If the _internal_CV is the only method I can use for now, I just need some clarifications on what pattern_idx should be?

EDIT: It looks like the index for resampling the rdm patterns/conditions. So if I want to use _internal_cv without any resampling, I think I should just use a np.arange(n_conditions)?

Heiko Schütt · Answer 3 · Mon Nov 13 2023 17:01:29 GMT+0800 (China Standard Time)

Running the crossvalidation without any bootstrapping requires two steps: First create training and test sets, then run the crossvalidation. This is described here:
https://rsatoolbox.readthedocs.io/en/stable/demo_bootstrap.html#Exercise-3:-Crossvalidation-for-flexible-models