cseanburns / peer_review_geography

R Code for Analysis of Peer Review Data on Geographical Bias

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I collaborated on a project that investigated peer review and geographical and language bias for submissions to an ecology journal. The results are published in the following article:

Burns, C.S., & Fox, C.W. (2017). Language and socioeconomics predict geographic variation in peer review outcomes at an ecology journal. Scientometrics, 113(2), 1113-1127. doi:http://doi.org/10.1007/s11192-017-2517-5

Dryad, data, doi: http://doi.org/10.5061/dryad.5090r

Data notes

The files in this repository, numbered s[0-8]---.R, include the source code for the analysis. They may also include some exploratory code, including plots, that may not be reported in the final paper.

Data files: decisions.csv and decisions_2.csv

  • decisons.csv is the source file
  • decisions_2.csv is a copy of the above but variable names were lightly edited in LibreOffice Calc and . notation was replaced with NA (to make R file friendly)

Variable definitions for decisions.csv and decisions_2.csv:

Variable Definition
sort: sequential list of numbers to sort by, if needed
ms_id: manuscript identification number
author_count: number of authors per manuscript
author_sex_ratio: ratio of females to males authors per manuscript
corr_auth_sex: sex of corresponding author: 0 male ; 1 female
first_auth_sex: sex of first author: 0 male ; 1 female
senior_auth_sex: sex of senior author: 0 male ; 1 female
submit_auth_sex: sex of submitting author: 0 male ; 1 female
corr_auth_first_auth: corresponding author is first author: 0 false ; 1 true
corr_auth_submit_auth: corresponding author is submitting author: 0 false ; 1 true
submit_auth_first_auth: submitting author is first author: 0 false ; 1 true
submit_auth_senior_auth: submitting author is senior author: 0 false ; 1 true
corr_auth_senior_auth: corresponding author is senior author: 0 false ; 1 true
no_auth_sex_ident: number of authors where sex has been identified
first_auth_geog: geographical region of first author
corr_auth_geog: geographical region of corresponding author
submit_auth_geog: geographical region of submitting author
senior_auth_geog: geographical region of senior
author handling_editor: handling editor ID
handling_editor_sex: sex of handling editor
handling_editor_geog: geographical region of handling editor
editor_seniority: seniority in years since PhD of editor
editor_years: years as handling editor
sex_ratio_reviewers: ratio of females to males of reviewers per manuscript
mean_review_score: mean review score (lower better)
mean_reviewer_days_respond: mean number of days for reviewers to respond for
request to review mean_reviewer_days_review: mean number of days for reviewersto review manuscript
prop_reviewers_responding: proportion of reviewers responding to request for review
prop_reviewers_agreeing: proportion of reviewers agreeing to review
sent_for_review: sent for review: 0 No ; 1 Yes
max_review_score: maximum review score per manuscipt
paper_rejected: paper rejected: 0 No ; 1 Yes
no_reviews_obtained: number of reviews received from reviewers
no_reviews_responded: number of reviewers responding to request for review
title_word_count: word count of manuscript
titles abstract_word_count: word count of manuscript abstract
time_to_decision: time in days to decision on manuscript

Data file: author_decisions.csv

Whereas the data in decisions.csv groups variables by manuscript, author_decisions.csv unpacks manuscript data and contains extra observations about authors per manuscript.

Variable definitions for decisions.csv and decisions_2.csv:

Variable Definition
manuscript_id: identification number for the manuscript
sort: sequential list of numbers to sort by, if needed
submit_year: year manuscript was submitted
submit_month: month manuscript was submitted
author_person_id: identification number for author (sequential)
author_order: author's order in byline
corresponding_author: author is corresponding author: 0 No ; 1 Yes ; 2 None
submitting_author: author is submitting author: 0 No ; 1 Yes
senior_author: author is senior author (last author): 0 No ; 1 Yes
missing_authors: any missing authors: 0 No ; 1 Yes
author_country: country of author based on byline
HDI: human development index score (see notes below)
language: language of author country (see notes below)
geographic region: geographic region of author
author_sex: sex of author (male/female)
prob_sex: probability that author_sex classification is true
author_institution: author's institution based on byline
manuscript_status: acceptance, revision, rejection status of mansuscript
final_decision: final decision on manuscript
submit_date: submission date in %m/%d/%y format

Notes on variable language in author_decisions.csv

Languages, CIA World Handbook:

  • In the CIA World Factbook, languages are listed in rank order by country. - If English is listed as an official language, despite the rank order, then English is selected. This is done to reduce bias in the analysis. - If the country is not listed, then an alternate site is used.

Notes and alternate sites for the following countries are listed below:

  • Cameroon people speak over two dozen African langauges (no dominant language), but English is listed as one of the official languages, so English is listed as the language for Cameroon.
  • Ghana lists English as an official language. English is used.
  • India: English
  • Namibia: English
  • Norway lists two versions of Norwegian: Bokmal Norwegian (official) and Nynorsk Norwegian (official). I reduced to Norwegian.
  • Rwanda: English.
  • Sri Lanka: English has special status in the constitution. English is used.
  • French Guiana: French from Wikipedia (5/31/2016)
  • Martinique: French from Wikipedia (5/31/2016)

Source: https://www.cia.gov/library/publications/the-world-factbook/fields/2098.html, accessed on May 31, 2016

Human Development index scores

Collected in order to evaluate against socioeconomic information

  • Caymen Islands is not listed, since it is a British Territory, we use the United Kingdom index.
  • Martinique is a territory of France, so we use the France index.
  • Monaco is an independent country, but does not have a HDI, so we use France, its closest neighbor.
  • New Caledonia is a part of France, so we use the France Index.
  • Puerto Rico is part of the US, so we use the US index.
  • Svalbard and Jan Mayan is part of Norway, so we use the Norway index.
  • Taiwan is counted as China.
  • French Guiana uses the France index.

Source: http://hdr.undp.org/en/countries, accessed on May 27, 2016

About

R Code for Analysis of Peer Review Data on Geographical Bias


Languages

Language:R 100.0%