gender-bias / gender-bias

Reading for gender bias

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nouns

molliem opened this issue · comments

Letters for women are more likely to use adjectives instead of nouns

Goal: Develop code that can read text for the presence of nouns that highlight roles/positions (like leader, researcher). If position nouns are absent, return a summary statement that directs the author to consider using nouns to strengthen the letter.

This one can be complicated. The goal is to differentiate between descriptions that use adjectives, verbs, or weaken the position noun (i.e., she was involved in research, she taught).

This project sounds amazing, congrats on pitching such a cool project. I'm going to start putting together a script to track frequency of nouns and adjectives for different letters, and potentially play with sentiment analysis to figure out if letters for males show a stronger positive sentiment indicating a higher use of superlatives. I do most of my coding in python, is that going to be a problem?

I am also interested in working on this problem. One way to approach this is to make a list of relevant nouns and their corresponding verbs and check the relative frequencies of these (e.g., if "leads" or "led" is used more than "leader"). That seems like a fairly simple first step and I could work on that. It's probably also a good idea to use POS tagging to detect passive voice, as that would catch things like "was involved".

What programming language are we using? I'm most comfortable with Python, though I've done some coding in Perl (I know other languages, but I don't think any of them would be good for this sort of problem). What sort of POS tagging would we use? I've used TreeTagger for Python and Lingua::EN::Tagger for Perl, but I know Python's nltk has several POS taggers built in. I've also used Spacy a little bit, but I'm less familiar with that.

Python is prefect! That is the language I know best! I'm still learning programming, so that isn't saying much! I am working on setting up a website (www.biascorrect.com). Hoping to have that ready to go by Thursday.

Feel free to use the POS tagging you are most comfortable with! Please remember to add your names to the contributors page as well. I want to be certain to recognize all the contributions.

Thank you both for the kind words, support, and help!

@molliem — love love love this project, and looking forward to helping!

I did a quick search of the repo and it doesn't look like anyone's mentioned proselint here. This is a general prose-checking framework (tips like weasel_words.very: don't use the word 'very', or typography.symbols.curly_quotes Use curly quotes “”, not straight quotes "".), and the needs of this project reminded me of proselint's plugin-based architecture.

In short, each 'plugin' has its own rules, ways of checking, and error messages — and each is completely independent of the others. So a adjectives_vs_nouns plugin can use a totally different technology to check for bias than stereotypes plugin.

Thought I'd drop the link here in case it's a useful reference, but in the meantime, looking forward to getting started wherever is most helpful!

@j6k4m8 Thank you for the kind words! And thanks for the link to proselint!! I hadn't heard of it and it is a fabulous reference!

Although we haven't been tackling this project as plugins, our approach feels similar. I divided the issues up into topics and people have been working on scripts for each topic. My plan at the end is to use a wrapper or a for loop to combine the separate topics into one.

Do you know python? There are four issues that no one has tackled: superlatives, family life, minimal assurance, and raises doubt. Help on any of those would be great. If you know web design, I could use some help there too. It is pretty plain.

Thanks for reaching out! Excited to have you join the team!

Python or web-design or both! Up to you, wherever you'd prefer to have more help!

Amazing! It would be great if you could work on family life, minimal assurance or raises doubt (any of them). My goal was to identify the presence of words and phrases associated with these areas and give feedback, but also to highlight the words in the text box. If you need help with word lists, I can probably tackle that this weekend.