ACM-Publication-Tools

Unofficial information and tools for publication chairs at ACM conferences (especially SIGCHI)

General Workflow

The tools in this repository support the following workflow, as employed for major SIGCHI conferences, such as CHI.

paper submission happens via PCS
authors of accepted papers upload their source files (TeX or Word) to TAPS
TAPS generates HTML and PDF versions
authors check the generated output, modify the source files, re-compile, and approve the final versions
publications chairs assist users in generating the PDF and HTML files via TAPS and check that the papers follow the ACM formatting guidelines
authors are required to manually make the final PDFs 'accessible' and upload them to PCS
authors also upload all further materials (videos, subtitles, supplementary materials) to PCS

Once these steps have been completed for a conference track, the publications team needs to do the following:

check for missing uploads
download all files from PCS
check whether the PDF files in PCS are really the final versions
check for major formatting problems (wrong template, wrong DOI, missing author information, ...)
check videos for problems
check supplementary materials for problematic content
upload all supplementary files and videos via ACMs upload portal (and remove any uploads that authors might have made)
provide all final PDF versions to Aptara (TAPS) so that they can provide them to ACM
(automatically generate lists and documents from metadata)

Ideally, the publications team also checks all PDFs for formatting issues and missing information (e.g., in references). However, this is only possible for small conferences (we did it for TEI '21).

Important Identifiers

PCS ID: the ID of a submission in PCS (e.g., pn1234). The prefix (pn, lbw, ...) indicates the track to which the submission belongs.
Track ID: the ID of a track in PCS, (e.g., chi23b). It appears in PCS URLs. One can assign a prefix (pn) for all submissions in a track.
TAPS ID: the ID of a submission in TAPS (e.g., 12). For each submission, TAPS also stores the PCS ID (called ACM ID here)
DOI: the Document Object Identifier of a published paper. DOIs are assigned by ACMs e-rights system before papers are imported into TAPS. TAPS also stores the DOI for each submission it handles.
Proceedings ID: the ACM-defined ID for a set of conference proceedings (e.g., 12345). Typically, there is one Proceedings ID for the main proceedings and one Proceedings ID for the extended abstracts. TAPS offers one portal per Proceedings ID.

Tools

The following tools assist the publication team in the tasks mentioned above. They have been developed on Debian Linux but might also work on other operating systems. The code is neither pretty nor performant. It seems to work, however.

the taps-procs.csv file

generated by taps.py.
downloaded from TAPS, contains all relevant metadata for each submission, including download links for PDF and HTML files

the camera-ready csv file

downloaded from PCS by pcs.py
contains all metadata from PCS, including links to all uploaded files and the content of all form fields

the 'fields' csv file

Tools usually process submissions for one track at a time.

All workflow metadata for this track needs to be provided in a file <track ID>_fields.csv.

Here's an example.

tracks,dl_flag,pcs_field,description,directory,suffix,mimetype,upload_to_dl,ready_field
pn,pdf,final_review_pdf,Camera-ready PDF,PDF,.pdf,application/pdf,no,Submission Complete
pn,video,Video Figure (Optional),Video Figure,VID,-video-figure.mp4,video/mp4,yes,
pn,video,Video Figure Captions (Required if the video figure contains spoken dialog),Video Figure Captions,VID_SRT,-video-figure-captions.vtt,text/vtt,yes,
pn,preview,video_preview,Video Preview,PRV,-video-preview.mp4,video/mp4,yes,
pn,preview,video_preview_captions,Video Preview Captions,PRV_SRT,-video-preview-caption.vtt,text/vtt,yes,
pn,talk,Pre-recorded Video of Talks,Talk Video,TLK,-talk-video.mp4,video/mp4,acmdl_agreement,
pn,talk,Video Presentation Caption,Talk Video Captions,TLK_SRT,-talk-video-caption.vtt,text/vtt,acmdl_agreement,
pn,supplement,Supplemental Materials (Optional),Supplemental Materials,SUP,-supplemental-materials.zip,application/zip,yes,

The first row contains the headers. Each subsequent row concerns one file type from PCS.

You can generate a fields file for a track using pcs.py --guess_fields <track ID>. You need to manually check and customize it however.

Column meanings:

tracks: the track prefix (not used currently)
dl_flag: a short identifier that defines a set of files that belong together (typically videos and subtitles). The dl_flag is used identify which files to upload/download during a script run.
pcs_field: the name of the column in PCS that contains the links to the files
description: the human-readable description of the file. Used in script output.
directory: the suffix of the sub_directory into which the files should be downloaded (VID ==> ./chi23b_VID/
suffix: the suffix for the file name (-video-figure.mp4 ==> pn1234-video-figure.mp4.
mimetype: the mimetype of the file - required for upload to ACM DL
upload_to_dl: determines whether the file should be uploaded to the ACM DL. Options: yes, no, or the name of a column in the camera_ready csv whose value (yes/no) determines whether the file should be uploaded. For example, authors might be able to check a box (ID e.g.,: dl_agreement) in PCS if they want their videos to be uploaded into the ACM DL. All fields csv lines concerning videos should then contain the string dl_agreement in the upload_to_dl column.
ready_field: only the value in the first row is interpreted. That value contains the ID of the PCS form field (usually a checkbox) that determines whether a submission is ready for publication. The checkbox is typically checked by track chairs or the publication team once the submission is ready. Because chairs do not reliably use this field, the scripts currently ignore the ready_field value.

Environment variables

The following environment variables should be set before running any of the scripts.

export TAPS_USER="<name used for TAPS login>"
export TAPS_PASSWORD="<password for TAPS login>"
export PCS_USER="<the email address used for logging into PCS (personal account of publication chair)>"
export PCS_PASSWORD="<password for PCS>"
export CONF_ID="<five-digit proceedings ID>"

pcs.py - download files from PCS and sort them

This Python script helps with checking the state of files in PCS, automatically downloading files from PCS, naming them appropriately, and sorting them into folders.

Commands and parameters:

pcs.py --guess_fields chi23b - generate a fields csv (chi23b_fields.csv.test) that you can customize and use for the subsequent steps
pcs.py --tracks X - lists all the tracks which the user has access to. (The 'X' is just because the script expects a parameter here but ignores the parameter. TODO)
pcs.py chi23b pdf video - download PDF and video files for track chi23b into the subdirectories specified in the fields csv. Instead of dl_flags, the parameter all can be provided to download all file types specified in the fields csv.

taps.py

This Python script helps with downloading metadata, PDF files, and HTML files from TAPS.

taps.py - download/scrape the file taps_procs.csv from TAPS
taps.py --pdf - download taps_procs.csv and download all PDF files in TAPS into a directory TAPS_PDF
taps.py --html - download taps_procs.csv and download all HTML files in TAPS into a directory TAPS_HTML (only HTML files, no media)
taps.py --all - download taps_procs.csv, PDF and HTML files.

lint.py

This Python script checks PDF files from PCS for common formatting problems. The file taps_procs.csv and the directories TAPS_HTML and TAPS_PDF need to be in the current working directory. They need to contain the PDF and HTML files downloaded from TAPS with taps.py Also, the directory <track ID>_PDF needs to be in the current directory and contain the final PDF files submitted to PCS (download with pcs.py).

Modify the source code to disable certain checks.

python3 -u ../ACM-Publication-Tools/lint.py chi23b | tee chi23b_lint.log - run all checks from lint.py, output the results to the terminal and write them into the file chi23b_lint.log. Also creates a CSV file chi23b_PDF.csv (yeah, inconsistent naming) which lists for each file which checks have failed.

(python3 -u is just needed to output unbuffered lines so that tee immediately prints them to the terminal)

Example linter log:


# Checking chi23b_PDF/pn1022.pdf
pn1022: check_line_length: Strange! Median line length is less than 40 (0) - which should not happen in the ACM two-column lay
out.
pn1022: check_differences_title: Different titles in HTML and PDF. Please check:
    Social Virtual Reality as a Mental Health Tool: How People Use VRChat to Support Social Connectedness and Wellbeing
    
pn1022: check_email: None of the authors has an email address given!
pn1022: check_ligatures_fi: Accessibility: the PDF plain text does not contain the letters 'fi' (but HTML does). Please check 
whether ligatures are encoded correctly.
pn1022: check_ligatures_ff: Accessibility: the PDF plain text does not contain the letters 'ff' (but HTML does). Please check 
whether ligatures are encoded correctly.
pn1022: check_pdf_creator: This PDF has not been generated by TAPS ('Producer' field in metadata says: )
pn1022: check_differences_reference_count: Different number of references found in HTML (70) and PDF (0). Please check.
pn1022: check_pdf_doi: DOI might be missing in PDF. DOI in HTML file: https://doi.org/10.1145/3544548.3581103
pn1022: check_pdf_difference_taps_pdf: File sizes in TAPS (0.54 MB.) and PCS (8.65 MB). differ by more than 20% - maybe not th
e same file.

Most checks generate a lot of false positives due to the limitations of PDF parsing. You should typically ignore check_differences_reference_count. The check for ligatures fails if authors have misused Adobe Acrobat to 'add accessibility' to their submission.

acm_dl.py

This Python script uploads supplementary files from/to ACMs Atypon system. It automatically recognizes whether a file has already been uploaded - unless you manually change the description of an uploaded file. If files have been marked as 'excluded', they are ignored, i.e., a new version of that file can be uploaded with the same description.

acm_dl.py list 12345 - downloads a list of all files that have already been uploaded for proceeding 12345 and saves it as 12345.cache.csv
acm_dl.py upload 12345 chi23b video - uploads (for Proceeding ID 12345) all video files for track chi23b. Uses data from the fields csv. Filenames of uploaded files are named <DOI-part>-<description>.<ext>, e.g., 323443.24231-video-figure.mp4. This is how ACM prefers it.
acm_dl.py exclude 12345 pn1234-video-figure.mp4 - marks all uploaded versions of this file as excluded. Function can be extended to select uploads by DOI or uploader instead. For performance reasons, this command uses the cache-csv downloaded by acm_dl.py list instead of getting the list of uploaded files each time.

check_video.py

This Python script checks the properties of video files and outputs a csv file on the terminal

clean_supplementary_materials.sh

removes unwanted files (__MACOSX/, .git/) from the ZIP files of supplementary materials

compress_pdf.sh / compress_video.sh

compresses videos or PDFs (usually without visible loss in quality).

rename_files.py

simple helper to rename files from PCS ID to TAPS ID (for submission to Aptara)

srt-to-vtt.py

small helper to convert subtitle files into the .vtt format required by ACM.

Acknowledgments

Thanks to Raphael Wimmer for the amazing work of coding the initial code base of this repository for CHI'22 and CHI'23.

sigchi / Publication-Tools