nhoizey / vscode-gremlins

Gremlins tracker for Visual Studio Code: reveals invisible whitespace and other annoying characters

Home Page:https://marketplace.visualstudio.com/items?itemName=nhoizey.gremlins

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] \u3164 not detected as gremlin

timkrins opened this issue · comments

Describe the bug
The unicode character \u3164 "HANGUL FILLER" is not detected as a gremlin.
See https://certitude.consulting/blog/en/invisible-backdoor/ for a great article on this character (and my inspiration for this bug report)

To Reproduce
Steps to reproduce the behavior:

  1. View a file containing \u3164
  2. Gremlin not marked

Example code (from article above)

const { timeout,ㅤ} = req.query;

Expected behavior
The \u3164 whitespace is detected as a gremlin.

Screenshots
Screenshot 2021-11-10 at 10 29 00

Operating system:

  • OS: macOS
  • Version 11.5.2

Visual Studio Code:

  • Version 1.61.2

Gremlins extension:

  • Version 0.26.0

Hey @timkrins , thanks for the suggestion. This looks like a reasonable addition. As a workaround for now, you can create a custom set of rules in VSCode and add this in. It should automatically fill out all of the default ones for you when you go to edit the gremlins.characters setting.

I just read the article too and came here to create the issue, so thanks a lot @timkrins for creating it first! 🙏

Do you have time to provide the Pull Request for this addition?

@nhoizey can do - what level should we mark it as?

there are actually a huge number of Unicode 'confusables'...

just for white spaces there are:
0x1680 OGHAM SPACE MARK
0x2000 EN QUAD
0x2001 EM QUAD
0x2002 EN SPACE
0x2003 EM SPACE
0x2004 THREE-PER-EM SPACE
0x2005 FOUR-PER-EM SPACE
0x2006 SIX-PER-EM SPACE
0x2007 FIGURE SPACE
0x2008 PUNCTUATION SPACE
0x2009 THIN SPACE
0x200A HAIR SPACE

I wonder if there would be a way of flagging any Unicode confusable.

@timkrins I don't know of a definitive way to classify certain unicode characters as "confusables" automatically. For this group though, you could at least configure a range to capture most of these. @sheldonhull recently put up PR #185 to add instructions on doing so to the README.

I can see @alexdima has created an issue in microsoft/vscode to perform this type of functionality natively (and the task assigned to @hediet in the November iteration plan) - microsoft/vscode#136437

@TheSench there is a list of them here: https://www.unicode.org/Public/security/14.0.0/confusables.txt

License for Unicode data files is here: https://www.unicode.org/license.txt

Thanks for the links, I'll take a look into those. I'd love to see this become a feature of VSCode itself, but until that comes, we'll see what can be done here.

greetings.. I found my way to this issue after reading a post by Chris Coyier titled The Invisible JavaScript Backdoor, which in turn linked to a source article by Wolfgang Ettlinger with the same title.

I've already extended my local gremlins.characters array with the following:

"3164": {
  "description": "'HANGUL FILLER'",
  "level": "error"
}

but not everybody will know about this "problem", so I feel this should be included in the extension's internal gremlin characters list..

is there any plan for this at the moment, or is it sitting waiting for more information and/or motivation?

@ZaLiTHkA see activity in issue linked above about this unicode-flagging feature being available in vscode natively.

Since vscode November 2021 (version 1.63) unicode highlighting functionality is native!
See https://code.visualstudio.com/updates/v1_63#_unicode-highlighting in the changelog.
Thanks @nhoizey and gremlins, you were great.