PrincetonUniversity / blocklint

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Expand the default block-list

JakeSummers opened this issue · comments

Good Morning!

This is a pretty nifty package. I would be interested in starting to use it.

One current limitation of this tool is that the default block-list is pretty limited:

parser.add_argument('--blocklist', help='Comma separated list of words '
'to lint in any context, with possibly special '
'characters between, case insensitive; '
'DEFAULT to master,slave,whitelist,blacklist')

This tool would be significantly more useful if it came packaged with a more extensive block-list. Right now, I need to make the block-list and get it code-reviewed (which I anticipate will be difficult).

In the readme, this alexjs is cited as inspiration:

This project is inspired by [Alex.js](https://alexjs.com).

I did a quick look and it seems like alexjs comes with a very comprehensive block-list via the retext-equality npm package. The full block-list is here: https://github.com/retextjs/retext-equality/tree/main/data/en

They also provide acceptable alternatives (with sources :) ) so that you can create output like this:

example.md
   1:5-1:14  warning  `boogeyman` may be insensitive, use `boogeymonster` instead                boogeyman-boogeywoman  retext-equality
  1:42-1:48  warning  `master` / `slaves` may be insensitive, use `primary` / `replica` instead  master-slave           retext-equality
  1:69-1:75  warning  Don’t use `slaves`, it’s profane                                           slaves                 retext-profanities
  2:52-2:54  warning  `he` may be insensitive, use `they`, `it` instead                          he-she                 retext-equality
  2:61-2:68  warning  `cripple` may be insensitive, use `person with a limp` instead             gimp                   retext-equality

⚠ 5 warnings

Source

It would be awesome if we could do the following:

  1. Copy the data from https://github.com/retextjs/retext-equality/tree/main/data/en into this repo
  2. Use that as the default block-list
  3. Add support for suggesting alternatives.

It's a good point but I have a few reservations. I almost purposefully made this unopinionated so others could customize as needed. Adding an alternative may be within scope, though a larger change. Here are my concerns with the full alexjs list:

  • Many examples are field specific. E.g. islamists probably won't come up in most software development.
  • Many examples are multiple words/phrases and wouldn't directly translate.
  • Some are still valid in the context of development (primative, bugreport).

Overall I think including all the inconsiderate words would add bloat for checking source code specifically. Someone who uses slurs in their code probably won't care if this tool complains. But legacy usage of something like blacklist or master is what I wanted to mostly catch. For markdown, I'd also run alexjs to catch offensive phrases and language.

Here's what I'd propose.

  1. make a blocklint config file that includes all single-word entries in the alexjs database
  2. check if linting with that list drastically increases runtime (it may, my regexes are fairly complex)
  3. add a --strict switch which will use the strict config file

So users have the option to specify the strict switch or copy the file from github and modify as they see fit. I'd be open to a PR for adding a reason, but that would require a lot of rewrites.