haskell / text

Haskell library for space- and time-efficient operations over Unicode text.

Home Page:http://hackage.haskell.org/package/text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

t_decode_utf8_lenient fails on Windows without simdutf

Bodigrim opened this issue · comments

(Extracted from #528 (comment))

As witnessed in https://github.com/haskell/text/actions/runs/5507568869/jobs/10037691564?pr=528#step:6:228, cabal test -f-simdutf can fail on Windows with

      error recovery
        t_decode_utf8_lenient:                                FAIL (0.01s)
          *** Failed! Falsified (after 99 tests and 55 shrinks):
          ["!\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL","\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\128\NUL"]
          "!\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\65533\NUL" /= "!\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\128\NUL"
          Use --quickcheck-replay=572156 to reproduce.
          Use -p '/t_decode_utf8_lenient/' to rerun this test only.

Looking at the structure of the offending string, it's most likely haskell/bytestring#575, fixed in bytestring-0.11.5.0. We should raise the bound at which we switch between native validation and bytestring:

#if defined(SIMDUTF) || MIN_VERSION_bytestring(0,11,2)