Distinguish between overt pronoun / gender identification by a Twitter user and inferred gender in results

Question

Distinguish between overt pronoun / gender identification by a Twitter user and inferred gender in results

cailyoung opened this issue 6 years ago · comments

https://github.com/ajdavis/twitter-gender-distribution/blob/221175cdd42189d091e3373d4b30437bcd753ef4/analyze.py#L107

                g = detector.get_gender(name, country)
                if g != 'andy':
                    # Not androgynous.
                    break

Hi there. The above section (and others that return values from the lookup table rather than taking a user's expressed gender/pronoun in to consideration) should probably be categorised differently in the results table - perhaps as 'probably male based on name' rather than being included in the 'male' column. Otherwise you are saying that the name to gender mapping you have used is absolutely accurate for all people who share a given name in the list.

A. Jesse Jiryu Davis · Answer 1 · Tue Sep 18 2018 16:06:25 GMT+0800 (China Standard Time)

I am certainly not claiming absolute accuracy. =) I think it would be useful in the results page to display, in small text, "12% declared pronouns" to indicate what percent of each category of accounts have declared pronouns. The genders of the rest are guessed. Would you like to make a PR?

Cail Young · Answer 2 · Mon Sep 24 2018 07:00:27 GMT+0800 (China Standard Time)

I'm very rusty in Python, but I could attempt a PR. Might take a while 😄

I think fundamentally we might disagree on whether it's appropriate to be grouping results at all, though. I don't believe any automated tool can state with any confidence that an individual's gender is any particular value unless that individual has explicitly stated it. So the counts based on pronoun are resonable, but to included inferred gender based on name is inappropriate in my opinion.

Any PR I make would alter the output table to make this distinction clear. Would that be OK?

A. Jesse Jiryu Davis · Answer 3 · Mon Sep 24 2018 07:08:25 GMT+0800 (China Standard Time)

Inferring gender based on name is the main function of this tool.

I'm curious what percentage of the people you follow on Twitter, or who follow you, declare their pronouns; it's my experience that very few Twitter users declare their pronouns in their profiles. Since the purpose of the tool is to help men like me follow more women and nonbinary people, I need a way to estimate the gender proportion of the maximum number of my Twitter contacts. I prefer an estimate of a large number of my contacts instead of a definitive count of a small number. Do you agree with that goal?

Given my goal, and how few people on Twitter announce their pronouns, I think the display should emphasize the estimate, rather than emphasize the number of people who announce their pronouns.

A. Jesse Jiryu Davis · Answer 4 · Sun Dec 23 2018 23:18:34 GMT+0800 (China Standard Time)

Here's what my results look like now: