htmlEntityToUtf8 adds around 600 kb to binary size with -d:release on Windows
metagn opened this issue · comments
https://github.com/soasme/nim-markdown/blob/master/src/markdownpkg/entities.nim
Tested by manually removing its use in my local Nimble instance:
# markdown.nim
proc escapeHTMLEntity*(doc: string): string =
var entities = doc.findAll(re"&([^;]+);")
result = doc
for entity in entities:
if not IGNORED_HTML_ENTITY.contains(entity):
let utf8Char = entity.htmlEntityToUtf8
Size of small website builder compiled with -d:release:
if not IGNORED_HTML_ENTITY.contains(entity):
let utf8Char = entity#.htmlEntityToUtf8
Same compilation settings:
Converting this to a constant table should save a large amount of space. A build option to turn it off might work as a temporary option though, like -d:markdownNoEntities
Update: Tried changing it to a hash table, it apparently does not save much space:
This makes sense because of the way case/of is optimized (case/of itself is probably faster than a hash table), but I expected it to have a bigger impact. My mistake.
What does save a little more space than that though is using an array of tuples and checking for equality every single time instead of hashes, sacrificing speed:
This is just a bad idea for performance. I would really rather just not have all this in my binary.
Forgot to mention this is on Nim 1.0.4.
This was a workaround due to nimlang std library function htmlparser.entityToUtf8
can't translate all of the html entities defined in commonmark spec, in particular, https://html.spec.whatwg.org/multipage/entities.json.
The current Nim implementation of converting entities is also through a hash, (source code), btw.
I'll create an upstream issue to Nimlang reporting the issue and hope more characters can be added to language std library. If the proposal can be approved, then this module is no longer needed in the library.
Adding an option markdownNoEntities
will make nim-markdown incompatible to the commonmark spec. I think correctness is very important as well.
Another way is to diff the above entities.json with the current entity set in Nim implementation and introduce those missing to nim-markdown. This is probably a solution that can both ensure correctness and reduce binary size without harm the performance.