Welford's for SD is better

Question

Welford's for SD is better

rebcabin opened this issue 8 years ago · comments

Bravo, btw. Reducible (online) statistics are great, and if you keep going down the same road, you will end up with Kalman filters and much much more (see references at the bottom).

Your algo for standard deviation squares first and then subtracts. It's exposed to catastrophic cancelation. Welford's fixes that: very similar to yours, but you subtract first (a little cleverly), then square. Here is a sketch of Welford's in Clojure. The wikipedia reference is below.

(defn running-mean
  ([]
   {:mean 0, :count 0})
  ([{:keys [mean count]} new-datum]
   (let [new-count (inc count)]
     {:mean  (+ (/ new-datum new-count) (* mean (/ count new-count)))
      :count new-count})))

(defn running-stats
  ([]
   {:mean 0, :count 0, :ssr 0, :variance 0, :std-dev 0})
  ([{:keys [ssr mean count variance std-dev] :as ostats} new-datum]
   (let [nrmean   (running-mean ostats new-datum),
         nssr     (+ ssr (* (- new-datum (:mean ostats))
                            (- new-datum (:mean nrmean)))),
         ncount   (:count nrmean),
         nvar     (if (> ncount 1), (/ nssr (dec ncount)), 0.0)
         nstd-dev (Math/sqrt nvar)]

     {:ssr      (double nssr),
      :mean     (double (:mean nrmean)),
      :count    (:count nrmean),
      :variance (double nvar),
      :std-dev  nstd-dev})))

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm

Also see http://vixra.org/abs/1609.0044 and http://vixra.org/abs/1606.0328

Christophe Grand · Answer 1 · Mon Jan 09 2017 23:51:42 GMT+0800 (China Standard Time)

thanks, fixed by 6047563 and in release 0.8.1

Christophe Grand · Answer 2 · Mon Jan 09 2017 23:57:21 GMT+0800 (China Standard Time)

I didn't dig too much into it but I tried a buffered variant of Welford (to amortize the division cost over several items) but (at least in CLJS) it was slower.

Brian Beckman · Answer 3 · Thu Jul 27 2017 11:54:20 GMT+0800 (China Standard Time)

It's plausible that Welford's is slower, but I think it's demonstrably safer on data sets with wide dynamic range. When you square, big numbers (> 1) get bigger and small numbers (< 1) get smaller. You can get in a situation where you're subtracting a small squared mean from a large sum of data squared. None of the formulas will give you any warning that this is happening, but Welford's will stave off disaster longer.