JuliaStats / StatsBase.jl

Basic statistics for Julia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cumulant function is not numerically stable

mattcbro opened this issue · comments

The recursion used for calculating cumulants from a sequence of data is not numerically stable. Apparently this is known in the literature and is discussed in this paper:
https://www-2.rotman.utoronto.ca/~kan/papers/ncq.pdf

I offer a simple test showing that the cumulants of a gaussian vector, which should all be zero after the second one, diverge.

using StatsBase
using Plots

# create a long sampled vector from a normal distribution
N = 10000
v = randn(N)


# compute and plot the cumulants.  Only the second
# one should be nonzero
kix = 1:15
rkstats = cumulant(v, kix)
plot(kix, rkstats)

Thank you so much for opening this issue! Would you be willing to open a PR implementing the more precise algorithms? (Is there any performance cost to doing so?)

Scipy restricts the range to 1:4, perhaps to avoid the numerical instabilities?

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstat.html

Thank you so much for opening this issue! Would you be willing to open a PR implementing the more precise algorithms? (Is there any performance cost to doing so?)

I'll look into it. I haven't tried to implement the one in the paper. Since it requires a solution to an eigenvalue problem it may be slower, but correctness is of course more important.

That would be great, thank you!

It might also be worth checking out what OnlineStats.jl does. IME it usually has cleaner/faster/more accurate implementations than we do.