Juris-M / citeproc-js

A JavaScript implementation of the Citation Style Language (CSL) https://citeproc-js.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CSL-M: How useful are the CSL-M specific prefix/suffix rules?

denismaier opened this issue · comments

As the locale conditions are now working due to your work on #107 (thanks again, Frank), I am trying to clean up all these validations errors that occur due to improper spacing in prefix or suffix attributes. (The style is now here.)

Now, I am not entirely convinced if the modification to affixes in CSL-M is really useful. I have a few concerns:

  1. It makes converting existing styles to CSL-M more difficult than necessary.
  2. This contraint can enforce really ugly structures, group inside group inside group.
  3. While I totally understand the reason for this modification, aren't there easier ways to get the desired behaviour? Like doing some post-processing clean up at the end; collapsing multiple spaces and so. I cannot think of a case where we would want two spaces so post processing procedures should not be a problem for other styles.

What do you think?

This is a good discussion to have. There was a small flame war over this issue about a decade ago (time flies!). Andrea Rossato, author of pandoc's citeproc-hs, raised the unpredictable breakage of delimiters as a fundamental flaw in CSL design that demanded ground-up refactoring of all CSL styles, moving all leading/trailing space affixes (at least) to group delimiter attributes. I took the position that the burden on style maintainers would be too great, and that processors should do implicit cleanup.

In the end, I implemented cleanup in citeproc-js. It's quite a hard problem, because the fixes have to be applied after initial rendering to the intermediate citation object, and that's a deeply nested structure with strings, delimiters, and affixes scattered all over the hierarchy. There are something like 15 possible pairing patterns, each of which needs to be checked across the entire tree. As Andrea argued at the time, the post-processing masks a lot of underlying flaws in existing styles, and the approach is not easy for other implementations to follow. But it's a compromise that helps things to more or less hold together.

The affix rules in the CSL-M schema adopt Andrea's preferred approach of imposing stricter discipline on style code. That was possible in CSL-M at the time, because styles in the family were few; but as you say, it makes it harder to adapt styles from the CSL repo to take advantage of CSL-M extensions.

I think there is value in the stricter discipline around affixes, particularly in the jurism macro that drives modular support. In that context, the return from macros is entirely unpredictable, and relying on affixes for joins would be particularly fragile and difficult to debug. That said, though, there is no reason not to offer a looser variant of the schema. It would come with a higher risk of weird punctuation bugs from unanticipated input, but so long as that's known, there isn't any deeper reason to enforce the constraints.

This might be a good excuse to look into a "feature" versioning mechanism that Cormac proposed awhile ago. The validation framework would need to sniff the appropriate schema from attributes set in the style (as it does now, but in a crude way). If we can work out a consistent way to represent schema variants common to all processors, this should be possible.

In the short term, I can take a look at the style, and help out with the adaptation work, if you can link to the current version (the link above yields a "not found" error).

Here is a gist of the May 6 version linked above, aligned with the CSL-M schema. It might need a review of the locator behavior, I've used some green code to handle those joins.

https://gist.github.com/fbennett/c9f163dcda335d398dd3168d2a47d269

Green code?

Oh, and the gist was not the current version. Seems that the link to the new version was broken... Here is the correct link to the file; it lives here

Thanks for applying the changes. I'll need to check how much I've changed since May... (probably not much...)

Ok, looking over the diffs, it seems I can just copy your changes to my style. Thanks again.

Hmmm, two things:

  1. The locators are gone. Is this because of require="comma-safe"? (I am not sure I understand what it does.)

  2. I won't get the & allthough I have and="symbol" all over the place. Also this should be inheritable, right? (This style will produce ampersands if I set version="1.0", but they disappear with version="1.1mlz1")

By the way, it turns out that the processor picks up the locale test even if we are in CSL 1.0.1 mode. Is this intended?

Concerning the schema variants: What about the approach using plus signs and minus signs to indicate additional features?

One possible drawback: what would this mean for schema validation in emacs, atom, etc. Could be a deal-breaker.

Anyway, that's probably a discussion to be had on the discourse forum, right?

One possible drawback: what would this mean for schema validation in emacs, atom, etc.

That's a very good point. The schemata are monolithic, as far as I understand, and that would indeed be a problem for fine-grained feature declarations.

Anyway, that's probably a discussion to be had on the discourse forum, right?

Our CSL-M solutions are in flux, maybe once things settle down.

The locators are gone.

That's ... a problem, ouch. I took a lazy approach to adapting locators, it obviously didn't work. Several things are piled up here, but I'll try to look at it as soon as I can.

I took a quick look, and the issues were less daunting than I thought. The style has two possible joins, based on simple evaluation of item type, so I just split locators to two macros and assigned them to group positions. The require/reject stuff that I added was just getting in the way, so I removed it.

The disappearing ampersand turns out to be straightforward. When a style overrides a locale term, all existing definitions of it are removed before adding the definitions set by the style. In CSL-M mode, the "en" condition takes effect and selects the "en" locale. Since long-form "and" is redefined there, the short-form "&" needs to be reset also.

(Edit: I filed a pull request against the style with the necessary changes.)

By the way, it turns out that the processor picks up the locale test even if we are in CSL 1.0.1 mode. Is this intended?

It looks like in CSL mode it evaluates the test, but the locale-switch doesn't take effect. Not sure how much it would take to bring that online.

Closing this for now.