Juris-M / citeproc-js

A JavaScript implementation of the Citation Style Language (CSL) https://citeproc-js.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regexes in util_page.js fail to recognize non-standard locators (e.g. *7, 2.1, 4(b), 3:8), so range is treated as single locator

dchawisher opened this issue · comments

It is sometimes necessary to cite pages, sections, paragraphs, etc. using a locator like 2.1, 4(b), 3:8 (common shorthand for 'page 3, line 8' when citing transcripts), or '3¶8' (used when citing numbered paragraphs if not all of the document is numbered). When citing ranges of those locators, util_page.js fails to appreciate that they are ranges because they do not match its regexes. Compounding the problem, util_page.js also replaces en dashes with hyphens, so a manual workaround is impossible.

The regex in question is here (similar regexes are used at 35 and 36):

rangerex = /([0-9]*[a-zA-Z]+0*)?([0-9]+[a-z]*)\s*(?:\u2013|-)\s*([0-9]*[a-zA-Z]+0*)?([0-9]+[a-z]*)/;

The following regex should (adapted as appropriate for lines 35 and 36) work without causing problems:

([0-9]*[a-zA-Z]+0*)?([0-9\:\.\§\¶\*]+\(*[a-z]*\)*)\s*(?:\u2013|-)\s*([0-9]*[a-zA-Z]+0*)?([0-9\:\.\§\¶\*]+\(*[a-z]*\)*)