haskell / haddock

Haskell Documentation Tool

Home Page:www.haskell.org/haddock/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Haddock crash when table contains "unicode" symbols

guibou opened this issue · comments

The following code is crashing haddock:

{- |

+-----+
| ✅  |
+-----+

-}
module Toto where
$ haddock --version
Haddock version 2.27.0, (c) Simon Marlow 2006
Ported to use the GHC API by David Waern 2006-2008
$ which haddock
/nix/store/80k5c2yalbmmgny0np0y7ayd864xqpj3-ghc-9.4.4/bin/haddock
$ haddock Toto.hs  

<no location info>: error:
    Data.Text.Internal.Fusion.Common.index: Index too large
CallStack (from HasCallStack):
  error, called at libraries/text/src/Data/Text/Internal/Fusion/Common.hs:1180:24 in text-2.0.1:Data.Text.Internal.Fusion.Common
  streamError, called at libraries/text/src/Data/Text/Internal/Fusion/Common.hs:1080:33 in text-2.0.1:Data.Text.Internal.Fusion.Common
  indexI, called at libraries/text/src/Data/Text/Internal/Fusion.hs:249:9 in text-2.0.1:Data.Text.Internal.Fusion
  index, called at libraries/text/src/Data/Text.hs:1839:13 in text-2.0.1:Data.Text
  index, called at utils/haddock/haddock-library/src/Documentation/Haddock/Parser.hs:464:17 in main:Documentation.Haddock.Parser
haddock: Cannot typecheck modules

This is highly sensible to whitespaces, for example:

+-----+
| ✅   |
+-----+

works.

I suspect that the problem is because the line length are checked based on byte number or number of characters, which does not match because of the encoding.

This is known, see #718 (comment), where @phadej says:

There /will/ be a problem with UTF-8 as for tables we need to count characters. I won't do anything for that at this point.

I'm mostly opening the ticket for reference.

This being said, it may be possible to be more robust and generate an invalid table or a more graceful crash.

Note: I'm using haddock with ghc 9.4 which uses text 2, but I've also observed the problem with ghc 9.2 and text 1.2.

Thanks you for reporting this! ❤️