Re-introduce english words to cell IDs

Question

Re-introduce english words to cell IDs

choldgraf opened this issue 3 years ago · comments

Background

In #216 we decided that the "english-friendly" word lists that we were using to generate cell IDs were problematic because certain combinations could be used in a manner that were derogatory or otherwise problematic. We could not find a suitable alternative, so opted for randomly-generated IDs instead.

Recently @betatim shared with me a reference to the glitch word repository which seems like it could be a good word list to use if we wish to re-instate english-friendly words.

It is MIT-licensed, and the README describes this process for choosing their words:

The words are pulled from curated files. We want the words and their pairings to be friendly, positive, inspiring, whimsical, memorable, etc. They should also be words that most people can easily remember and spell.

All of the words and their generated pairings should be safe for children of all cultures. This means that we permit absolutely no word pairings that invoke to hate speech, hostility, derogatory terms, etc.

Despite our best efforts, it's easy for a pair of benign words to be combined into something inappropriate. Whenever we notice a generated pair that is problematic, we'll remove at least one of the words from that pair so that it won't reoccur. We'll err on the side of trusting reports and removing potentially inappropriate words rather than defending the appropriate uses of a word.

When adding words to the list, an abundance of common sense is required. If the word can be used as a slang term for an ethnicity or nationality, there's probably a context where it'll pair up with a verb or adjective that can make it feel unwelcome... so be mindful and avoid those.

Wes Turner · Answer 1 · Thu Jun 17 2021 03:33:08 GMT+0800 (China Standard Time)

Benign and hopefully as latent content free as such a usability change might be, it is or would be distracting and unnecessary.

Edit:

nb["green-ideas-furiously"] # variable declarations
nb["dont-pink-elephant-this-book"] # pd.read_csv
nb["clean-language-cognitive-metaphor"] # display(output["summary_table"])

# or
nb["jgdd543xus"] # variable declarations
nb["64abcf=\+%22"] # pd.read_csv
nb[uuid.uuid4()] # display(output["summary_table"])

Jason Grout · Answer 2 · Sat Aug 27 2022 17:23:32 GMT+0800 (China Standard Time)

I agree with Wes: I think we should not use english words as ids. I think ids are largely invisible to the user, and the potential for a problem is too great. Besides, why should we use english words when Jupyter notebooks are used all around the world?

Chris Holdgraf · Answer 3 · Sat Aug 27 2022 17:45:47 GMT+0800 (China Standard Time)

I don't feel strongly about this and unless somebody suggests a strong reason this would be beneficial, i think we should close it

Michał Krassowski · Answer 4 · Sun Oct 23 2022 02:05:55 GMT+0800 (China Standard Time)

Is this a duplicate of #218?

Chris Holdgraf · Answer 5 · Sun Oct 23 2022 16:10:33 GMT+0800 (China Standard Time)

Not sure - but I'm just going to close this anyway because it seems like there isn't agreement that it's a good idea in the first place.