Consider improving error handling and `StatsError` type

Question

Consider improving error handling and `StatsError` type

FreezyLemon opened this issue 2 months ago · comments

enum StatsError is supposed to be an

Enumeration of possible errors thrown within the statrs library.

This indicates that it should not try to be a generic error type for any kind of statistics calculations, but instead only concern itself with the errors produced in statrs.

With that basic assumption, there are currently some inconsistencies and outdated practices though (API guidelines for reference):

Add test that StatsError implements both Sync and Send (this is more of a formality, but is trivial to implement and good future-proofing)
Error::description is deprecated and should not be implemented
There are a few unused variants (ArgNotNegative, ArgIntervalExclMin, etc.), these are leftovers from older versions and should be removed because they're no longer needed by statrs.
There's at least one case of a function returning a Result<T> that cannot return an error: Empirical::new. There's no real reason for this infallible API to return a Result<T, E>. There might be others.

I realize that most of these are breaking changes, but seeing that the crate is pre-1.0, I don't think there's a big problem doing this.

Other things that could be improved:
StatsError is big: 40 bytes on a Linux x64 target. This is because there are variants which contain 2x &str (2 x 16 = 32 bytes plus discriminant and padding). Is it really necessary to have strings in the error type? The implementation could be replaced mostly 1:1 with some sort of ArgName enum, but there might be an even better solution that does not need this argument name wrapping at all.

All new functions seem to just return StatsError::BadParams whenever the params are outside of the defined/allowed range. Is there a good reason for these to be so vague when compared to the more specific errors returned by other functions? After all, the more specific errors already exist, why not return more exact error information? There might even be value in providing multiple error types, to have errors that are more specific to the exact API being used.

Orion Yeung · Answer 1 · Sun Apr 28 2024 10:56:06 GMT+0800 (China Standard Time)

I do see good reason for all of these and I'd be open to making changes for all but the infallible new not returning a result.

All other public structs defined in the distribution module have a new method implemented that returns Result so it does provide consistency. Perhaps if Empirical were in a different module or it's new had a different name?

FreezyLemon · Answer 2 · Mon Apr 29 2024 00:50:13 GMT+0800 (China Standard Time)

All other public structs defined in the distribution module have a new method implemented that returns Result so it does provide consistency. Perhaps if Empirical were in a different module or it's new had a different name?

I see your point about consistency. I would personally value the expressiveness ("this call cannot fail") over the consistency ("all constructors return a Result and might need error handling"), but it doesn't matter much tbh.

Hmm I'm not sure about renaming the new function, it's a widespread naming convention in the Rust ecosystem and what most users would expect. Maybe a similar name like new_<something> so people can quickly find it in their IDEs.

Orion Yeung · Answer 3 · Mon Apr 29 2024 06:02:47 GMT+0800 (China Standard Time)

Perhaps an impl Default over having Empirical::new?

Regardless, the overall discussion you bring up on the error type is valid. I'd merge an effort that does any of

scaling it down
making it adhere closer to API guidelines
returning StatsError::BadParams less where possible.