The Rise and Fall of the Note: Changing Paper Lengths in ACM CSCW, 2000-2018
By R. Stuart Geiger, staff ethnographer, Berkeley Institute for Data Science, UC-Berkeley
This repo contains the code and data needed to reproduce the figures in a paper (arxiv link, publisher link) in Proceedings of the ACM on Human-Computer Interaction -- the new journal venue for the proceedings of the ACM conference on Computer-Supported Cooperative Work (or CSCW). The entire study involved text analysis of copyrighted papers, which is not free to redistribute here. However, the notebook I used for processing the PDFs is available for reference at code/data-cleaning-processing.ipynb
. A data file containing all the quantitative statistics for each paper is at data/cscw-pages-notext.csv
. This file is loaded by code/analysis-viz.ipynb
, which processes it to produce the statistics and graphs presented in the paper. This notebook can also be run interactively for free in the cloud with Binder, so you can change various parameters or visualize it differently.
This repo now also includes 2019 PACMHCI CSCW data. The existing notebooks and data files that were used for the original paper are still in this repo, but new data files and notebooks are also in this repo with a -2019
suffix.
Abstract
In this note, I quantitatively examine various trends in the lengths of published papers in ACM CSCW from 2000-2018, focusing on several major transitions in editorial and reviewing policy. The focus is on the rise and fall of the 4-page note, which was introduced in 2004 as a separate submission type to the 10-page double-column "full paper" format. From 2004-2012, 4-page notes of 2,500 to 4,500 words consistently represented about 20-35% of all publications. In 2013, minimum and maximum page lengths were officially removed, with no formal distinction made between full papers and notes. The note soon completely disappeared as a distinct genre, which co-occurred with a trend in steadily rising paper lengths. I discuss such findings both as they directly relate to local concerns in CSCW and in the context of longstanding theoretical discussions around genre theory and how socio-technical structures and affordances impact participation in distributed, computer-mediated organizations and user-generated content platforms. There are many possible explanations for the decline of the note and the emergence of longer and longer papers, which I identify for future work. I conclude by addressing the implications of such findings for the CSCW community, particularly given how genre norms impact what kinds of scholarship and scholars thrive in CSCW, as well as whether new top-down rules or bottom-up guidelines ought to be developed around paper lengths and different kinds of contributions.
Data Dictionary
Row name | Description | Example 1 | Example 2 | Example 3 |
---|---|---|---|---|
filename | Filename (minus .pdf) in the original dataset | 2012/p253-muller | 2017.5/a033-chounta | 2004/p21-hupfer |
words | Total number of words, including references and appendices | 3096 | 10613 | 3368 |
year_float | Year of publication in float, 2017 Online First is 2017.5 | 2012 | 2017.5 | 2004 |
characters | Total number of characters, including references and appendices | 21327 | 74482 | 22637 |
num_pages | Number of pages in the PDF | 4 | 20 | 4 |
orientation | PDF paper orientation: 0 is portrait, 90 is landscape | 0 | 0 | 0 |
year | Year of publication in float, 2017 Online First is 2017.5 | 2012 | 2017.5 | 2004 |
words_per_page_total | Number of words per page across the entire document | 774 | 530.65 | 842 |
chars_per_word_total | Number of character per page across the entire document | 6.88857 | 7.018 | 6.7212 |
appx_start | Character position of the beginning of the appendix (False if no appendix) | False | 69201 | False |
ref_start | Character position of the beginning of the references | 19387 | 60881 | 20976 |
appx_len_chars | Length of appendix in characters | 0 | 5281 | 0 |
ref_len_chars | Length of reference section in characters | 1940 | 8320 | 1661 |
appx_len_words | Length of appendix section in words | 0 | 464 | 0 |
ref_len_words | Length of reference section in words | 274 | 1097 | 235 |
words_per_page | Number of words per page across the entire document | 774 | 530.65 | 842 |
body_len_chars | Length of the main paper in characters (no references or appendices, but includes the front matter) | 19387 | 60881 | 20976 |
body_len_words | Length of the main paper in words (no references or appendices, but includes the front matter) | 2822 | 9052 | 3133 |
appx_prop_words | Proportional length of the appendix by the total paper length (in words) | 0 | 0.04372 | 0 |
ref_prop_words | Proportional length of the reference section by the total paper length (in words) | 0.0885013 | 0.103364 | 0.0697743 |
appx_prop_chars | Proportional length of the appendix by the total paper length (in characters) | 0 | 0.070903 | 0 |
ref_prop_chars | Proportional length of the reference section by the total paper length (in words) | 0.0909645 | 0.111705 | 0.0733754 |
body_words_per_char | Number of words per character in the main body | 6.86995 | 6.7257 | 6.69518 |
ref_words_per_char | Number of words per character in the reference section | 7.08029 | 7.58432 | 7.06809 |
appx_words_per_char | Number of words per character in the appendix | NaN | 11.3815 | NaN |
title_from_text | Title of the paper (imputed from the paper text, may not be perfect) | Lurking As Personal Trait Or Situational Disp... | When To Say “Enough Is Enough!”: A Study On T... | Introducing Collaboration Into An Application... |
lead_author | Lead author of the paper, according to ACM DL filename | muller | chounta | hupfer |
title_has_quote | 1 if the title contains a quotation mark, 0 if it does not | 0 | 1 | 0 |