statrs-dev / statrs

Statistical computation library for Rust

Home Page:https://docs.rs/statrs/latest/statrs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Serde feature for serialization/deserialization

ndebuhr opened this issue · comments

An optional "serde" feature appears in creates like rand, ndarray, and nalgebra. For a project which requires statrs distribution serialization and deserialization, it appears as though I'd have to copy over all the distribution struct data structures - create a sort of serializable/deserializable wrapper or interface for the statrs distributions. Before doing that, I'd like to check in if I'm missing an obvious existing solution and, if not, if it would be valuable for me to take a stab at contributing a serde feature to the statrs repo.

Can you elaborate on a use case necessitating the serialisation of such a struct?
I suppose having universal serde support means there's one thing less to have to think about, but I'm still curious whether there's a concrete use case.

@troublescooter. Thanks for the quick reply. For some graduate school work, I'm building a simulation engine that can simulate stochastic models. These models are defined by the user in a yaml format on a webpage. When the user would like to simulate the stochastic models, the JavaScript passes the yaml to the Rust simulator core (compiled to WebAssembly), the yaml specification is deserialized, and the simulation is executed. If the statrs distribution structs are directly serializable/deserializable, then that would seemingly substantially reduce the interface/wrapper code. Example yaml specified by a user:

# Exponential
generator:
  interarrival:
    rate: 0.5
# Gamma
processor:
  service:
    shape: 0.5
    rate: 0.75

In this simplified example, the Rust structs would be something like this (if a serde feature were added to statrs):

#[wasm_bindgen]
#[derive(Clone, Default, Serialize, Deserialize)]
pub struct Simulation {
    generator: Generator,
    processor: Processor,
} 

#[derive(Clone, Default, Serialize, Deserialize)]
pub struct Generator {
    interarrival: statrs::distribution::Exponential,
} 

#[derive(Clone, Default, Serialize, Deserialize)]
pub struct Processor {
    service: statrs::distribution::Gamma,
} 

The serialization is less important than the deserialization, but the status quo among other crates appears to be implementing them both simultaneously.

For this use case it would be better to do wrapper structs anyway. Serialisation leaks implementation details, which would be particularly awful with the MultivariateNormal distribution which precomputes and stores additional matrices inside the struct. Best bet is probably wrapping the arguments handed to the constructors.

Makes sense. Thanks @troublescooter. As I continue with this application/project, I'll reach out if it looks like there is anything valuable I can contribute to the statrs repo. Really appreciate you and the team's work creating this wonderful crate. As an aside, the Empirical distribution you added recently is really nice. Is there a plan to cut a v0.14 some time soon?