GAA-UAM / scikit-fda

Functional Data Analysis Python package

Home Page:https://fda.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add standard deviation

vnmabus opened this issue · comments

Add a method for computing standard deviation of functional data both in discretized and basis expansions.

There is an issue regarding the design of the std function that should be specified, which is the normalization coefficient to apply and whether it should be up to the user.

The definition of std provided in Kokoszka and Reimherr (2017) is:

$$(std_X(t) ) ^2=\frac{1}{N} \sum_{n=1}^{N} (X_n(t) - \overline{X}(t))^2.$$

This normalization by $N$ is the default used in numpy.var and numpy.std, the latter being the most natural function to use in the implementation of FDataGrid.std:

def std(X: FDataGrid) -> FDataGrid:
    return X.copy(
        data_matrix=np.array([np.std(X.data_matrix, axis=0)]),
        sample_names=("standard deviation",),
    )

However, the easiest implementation of FDataBasis.std uses the FDataBasis.cov method. FDataBasis.cov calculates the covariance using the formula:

$$(K_X(t, s) ) ^2=\frac{1}{N-1} \sum_{n=1}^{N} (X_n(t) - \bar{X}(t))(X_n(s) - \bar{X}(s)),$$

because $(N-1)$ is the default normalization used by numpy.cov.

A natural solution to this issue would be to make the signature of std similar to that of numpy.std, where there is a parameter:

ddof: int, optional
Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

But including this ddof parameter in std would require adding a similar one to the cov function.

I closed this issue by accident. I'm sorry.