Haddock crash when table contains "unicode" symbols
guibou opened this issue · comments
The following code is crashing haddock:
{- |
+-----+
| ✅ |
+-----+
-}
module Toto where
$ haddock --version
Haddock version 2.27.0, (c) Simon Marlow 2006
Ported to use the GHC API by David Waern 2006-2008
$ which haddock
/nix/store/80k5c2yalbmmgny0np0y7ayd864xqpj3-ghc-9.4.4/bin/haddock
$ haddock Toto.hs
<no location info>: error:
Data.Text.Internal.Fusion.Common.index: Index too large
CallStack (from HasCallStack):
error, called at libraries/text/src/Data/Text/Internal/Fusion/Common.hs:1180:24 in text-2.0.1:Data.Text.Internal.Fusion.Common
streamError, called at libraries/text/src/Data/Text/Internal/Fusion/Common.hs:1080:33 in text-2.0.1:Data.Text.Internal.Fusion.Common
indexI, called at libraries/text/src/Data/Text/Internal/Fusion.hs:249:9 in text-2.0.1:Data.Text.Internal.Fusion
index, called at libraries/text/src/Data/Text.hs:1839:13 in text-2.0.1:Data.Text
index, called at utils/haddock/haddock-library/src/Documentation/Haddock/Parser.hs:464:17 in main:Documentation.Haddock.Parser
haddock: Cannot typecheck modules
This is highly sensible to whitespaces, for example:
+-----+
| ✅ |
+-----+
works.
I suspect that the problem is because the line length are checked based on byte number or number of characters, which does not match because of the encoding.
This is known, see #718 (comment), where @phadej says:
There /will/ be a problem with UTF-8 as for tables we need to count characters. I won't do anything for that at this point.
I'm mostly opening the ticket for reference.
This being said, it may be possible to be more robust and generate an invalid table or a more graceful crash.
Note: I'm using haddock with ghc 9.4 which uses text 2, but I've also observed the problem with ghc 9.2 and text 1.2.
Thanks you for reporting this! ❤️