Hedgehog is unlikely to find counterexamples close to the bounds of a large `enum` range

Question

Hedgehog is unlikely to find counterexamples close to the bounds of a large `enum` range

sol opened this issue 2 years ago · comments

First things first, Hedgehog is awesome!

One minor stumbling block I came across is that Hedgehog is unlikely to find counterexamples that are close to the bounds of a large enum range.

Problem

Given a broken function

isAscii :: Char -> Bool
-- broken implementation: erroneously classifies '\128' and '\1114111' as ASCII
isAscii c = c <= '\128' || c >= maxBound

and a property

nonAscii :: MonadGen m => m Char
nonAscii = enum '\128' maxBound

prop_isAscii_returns_False_for_non_ascii_characters :: PropertyT IO ()
prop_isAscii_returns_False_for_non_ascii_characters = do
  c <- forAll nonAscii
  isAscii c === False

it is unlikely that Hedgehog will find a counterexample:

ghci> check $ withTests 100_000 $ property prop
  ✓ <interactive> passed 100000 tests.

Possible solution

The definition of enum uses Range.constant.

Changing this to Range.exponential will make Hedgehog discover counterexamples close to the lower bound.

enum'exponential :: (MonadGen m, Enum a) => a -> a -> m a
enum'exponential lo hi =
  fmap toEnum . Gen.integral $
    Range.exponential (fromEnum lo) (fromEnum hi)

However, this will make it even more unlikely to discover a counterexample close to the upper bound.

The easiest thing I could think of is

enum'better :: (MonadGen m, Enum a) => a -> a -> m a
enum'better lo hi = Gen.choice [
    enum'exponential lo hi
  , enum'exponential hi lo
  ]

which will discover counterexamples both close to the upper / lower bound, albeit with altered shrinking behavior.

Any thoughts?

Nikos Baxevanis · Answer 1 · Thu Mar 02 2023 17:34:46 GMT+0800 (China Standard Time)

Thank you for reporting this! Good catch 🎯 🦔

IIRC, at the time enum was added, exponential range wasn't there yet.

Phil Hazelden · Answer 2 · Thu Apr 13 2023 16:54:39 GMT+0800 (China Standard Time)

enum'exponential hi lo

Hm, I haven't looked in depth but I think this might rely on unspecified behavior. It'll ultimately call randomR, and I don't see anything in between that causes the bounds to be swapped. (Maybe Range.bounds should take care of that?)

Minor suggestions:

I'd add a third choice retaining the original behavior. Then elements in the middle are still 1/3 as likely to get picked as now, rather than only getting generated when the size is sufficiently large (which I think would be the case with just the exponentials).
If the shrinking is a problem, it would be easy to use a variant of choice that doesn't shrink to a different list element (it would be defined using integral_ instead of integral to select the index).