webhintio / hint

πŸ’‘ A hinting engine for the web

Home Page:https://webhint.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Crash if there be a <200b> character in html file

ahangarha opened this issue Β· comments

🐞 Bug report

Description

Today I received a question regarding failure of hint to check the html file. This was the error:

Run npx hint .
AnalyzerError: Cannot read property 'getLocation' of undefined
    at Analyzer.analyze (/home/runner/work/Awesome-Books/Awesome-Books/node_modules/hint/dist/src/lib/analyzer.js:151:23)
    at runNextTicks (internal/process/task_queues.js:62:5)
    at processImmediate (internal/timers.js:434:9) {
  status: 'AnalyzeError'
}
Error: Process completed with exit code 1.

After investigating different things I realized the html file contains some uncode character with code of 200b.

image

By removing those characters, hint started working again.

Details

  • hint version: 6.1.10
  • OS: Ubuntu 20.04
  • .hintrc content: even without .hintrc the issue was ovserved.

Thanks a lot for reporting this crash. Looks like it might be nice to make webhint more resilient here, and perhaps have a nicer error message. @antross any thoughts?

Also, a PR would be very much welcome here if people wanted to contribute one.

Looks like 200b is a zero-width space in unicode. I would think this is acceptable whitespace.

I tested my own file with these characters in both the CLI and the VS Code extension and was unable to reproduce the crash. VS Code also helpfully highlighted where the zero-width space was:
image

@ahangarha we may need a specific repro file added to the bug to make this actionable as there must be something in addition to this character leading to the crash.

Do you need a repo or a simple HTML file? Even better if we have a unit test for it rather going to check an actual file.

This commit is where we noticed this issue on. Regiss05/Awesome-Books@6af76b4

Thanks @ahangarha, now I can see what's going on 😊

Turns out the zero-width spaces are causing the HTML parser to open the <body> tag early (as only certain whitespace is treated as non-content by the spec). This is likely per-spec and results in the following DOM:

<html lang="en"><head></head><body>​
    <meta charset="UTF-8">

However, there may be a bug in the JSDOM HTML parser here as a <title> element should have been implicitly created in the <head>, even when the <body> tag is opened early due to encountering content. But, since the <head> tag ends up empty, webhint's hint-meta-charset-utf-8 hits an error when calling getLocation on the first child of the <head> tag:

            const firstHeadElement = document.querySelectorAll('head :first-child')[0];
            const isCharsetMetaFirstHeadElement = charsetMetaElement && firstHeadElement && charsetMetaElement.isSame(firstHeadElement);

            const headElementContent = document.querySelectorAll('head')[0].outerHTML;
            const isMetaElementFirstHeadContent = (/^<head[^>]*>\s*<meta/).test(headElementContent);

            if (!isCharsetMetaFirstHeadElement || !isMetaElementFirstHeadContent) {

                const severity = (firstHeadElement.getLocation().endOffset || 0) <= 1024 ?

I can't say for sure that the missing <title> in this case is a bug without digging through the parsing algorithm in the HTML standard some more, but I can say that hint-meta-charset-utf-8 should be more robust regardless. A small update to the check to handle a missing firstHeadElement should be all that's needed to prevent the error:

const severity = (firstHeadElement.getLocation().endOffset || 0) <= 1024 ?

to

                const severity = (firstHeadElement?.getLocation().endOffset || 0) <= 1024 ?