JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`DiscreteNonParametric` and `Categorical` Construction Issue

btmit opened this issue · comments

Construction of a Categorical distribution seems to make a copy of the p vector. I see this through profiling, @btime and the fact that I can't see changes in the original vector after I create the Categorical. There are three issues I see:

  1. Categorical docstring includes the following: "Note: The input vector p is directly used as a field of the constructed distribution, without being copied." which seems incorrect.
  2. Performance issues in critical sections of code where this allocation can really add up
  3. Bugs such as the following:
using Distributions
x = rand(3,5)
x = x / sum(x, dims=1)  # each column is a valid probability vector
c = Categorical.(eachcol(x))

julia> c = Categorical.(eachcol(x))
ERROR: MethodError: Cannot convert an object of type Vector{Float64} to an object of type SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}

I believe the underlying issue is that the DiscreteNonParametric inner constructor tries to sort and reorder everything, which creates a copy and then the constructor doesn't update the type.