Hedgehog is unlikely to find counterexamples close to the bounds of a large `enum` range
sol opened this issue · comments
First things first, Hedgehog is awesome!
One minor stumbling block I came across is that Hedgehog is unlikely to find counterexamples that are close to the bounds of a large enum
range.
Problem
Given a broken function
isAscii :: Char -> Bool
-- broken implementation: erroneously classifies '\128' and '\1114111' as ASCII
isAscii c = c <= '\128' || c >= maxBound
and a property
nonAscii :: MonadGen m => m Char
nonAscii = enum '\128' maxBound
prop_isAscii_returns_False_for_non_ascii_characters :: PropertyT IO ()
prop_isAscii_returns_False_for_non_ascii_characters = do
c <- forAll nonAscii
isAscii c === False
it is unlikely that Hedgehog will find a counterexample:
ghci> check $ withTests 100_000 $ property prop
✓ <interactive> passed 100000 tests.
Possible solution
The definition of enum
uses Range.constant
.
Changing this to Range.exponential
will make Hedgehog discover counterexamples close to the lower bound.
enum'exponential :: (MonadGen m, Enum a) => a -> a -> m a
enum'exponential lo hi =
fmap toEnum . Gen.integral $
Range.exponential (fromEnum lo) (fromEnum hi)
However, this will make it even more unlikely to discover a counterexample close to the upper bound.
The easiest thing I could think of is
enum'better :: (MonadGen m, Enum a) => a -> a -> m a
enum'better lo hi = Gen.choice [
enum'exponential lo hi
, enum'exponential hi lo
]
which will discover counterexamples both close to the upper / lower bound, albeit with altered shrinking behavior.
Any thoughts?
Thank you for reporting this! Good catch 🎯 🦔
IIRC, at the time enum was added, exponential range wasn't there yet.
enum'exponential hi lo
Hm, I haven't looked in depth but I think this might rely on unspecified behavior. It'll ultimately call randomR
, and I don't see anything in between that causes the bounds to be swapped. (Maybe Range.bounds
should take care of that?)
Minor suggestions:
- I'd add a third choice retaining the original behavior. Then elements in the middle are still 1/3 as likely to get picked as now, rather than only getting generated when the size is sufficiently large (which I think would be the case with just the exponentials).
- If the shrinking is a problem, it would be easy to use a variant of
choice
that doesn't shrink to a different list element (it would be defined usingintegral_
instead ofintegral
to select the index).