New value for `require`

Question

New value for `require`

georgd opened this issue 4 years ago · comments

Georg Mayr-Duffner commented 4 years ago

Seeking to consolidate Austrian and international rules in one jurism macro, I’ve come to the conclusion that it won’t be possible at 100%, hitting a wall at the delimiter before jurism-locator. I could, however fix most cases if I had a new value for require at hand that complies with Austrian rules for page number (translated from the rulebook AZR, margin-no. 25):

If a page number immediately follows another number, they have to be delimited by a comma. In all other cases, the page number is only preceded by a space. If a page number follows a superscript edition number no comma is to be set.

Examples:
Krejci, OJZ 2011, 346.
Mayr, Vergleichsversuch 90.
Knyrim, Datenschutzrecht³ 103.

@fbennett could this, please, be implemented in a require="comma-safe-number"

In this test, no comma should be set if the preceding number is suffixed by a closing parenthesis and no comma may be set if the page number itself is prefixed with an opening parenthesis.

In the following sequences of 123 456, only the first is delimited by a comma:

123, 456
123) 456
123 (456
<sup>123</sup> 456

Frank Bennett · Answer 1 · Mon Sep 28 2020 23:38:28 GMT+0800 (China Standard Time)

Those look like better results for comma-safe itself. I've built a test based on your examples above, tomorrow I'll work on the changes in place.

Georg Mayr-Duffner · Answer 2 · Tue Sep 29 2020 04:17:51 GMT+0800 (China Standard Time)

I suggested a new value because as I read it, comma-safe also tests true if the second part of the sequence starts with a 'romanesque' character. In our case, this never should happen. The following sequences never should test true for us:

abc def
123 def
abc 123
abc) def
abc (def

Frank Bennett · Answer 3 · Tue Sep 29 2020 05:10:43 GMT+0800 (China Standard Time)

So it is correct that no comma should be used in these cases?

…

On Tuesday, September 29, 2020, Georg Mayr-Duffner ***@***.***> wrote: I suggested a new value because as I read it, comma-safe also tests true if the second part of the sequence starts with a 'romanesque' character. In our case, this never should happen. The following sequences never should test true for us: abc def 123 def abc 123 abc) def abc (def — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#151 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASMSRZBPWTMEYVKS3OG2TSIDVP3ANCNFSM4R4IBLHQ> .

Georg Mayr-Duffner · Answer 4 · Tue Sep 29 2020 05:31:32 GMT+0800 (China Standard Time)

Correct. Locator labels are presumably considered clear enough to do without the comma.

Frank Bennett · Answer 5 · Tue Sep 29 2020 11:05:38 GMT+0800 (China Standard Time)

The new attribute (comma-safe-numbers-only) is available in the latest Jurism beta and in citeproc-test-runner.

Georg Mayr-Duffner · Answer 6 · Tue Sep 29 2020 15:19:51 GMT+0800 (China Standard Time)

That’s great! Thank you! Now it’s time for refactoring :)

Frank Bennett · Answer 7 · Tue Sep 29 2020 15:33:58 GMT+0800 (China Standard Time)

Looking forward to it! If any further adjustments are needed, just let me know.

…

On Tuesday, September 29, 2020, Georg Mayr-Duffner ***@***.***> wrote: That’s great! Thank you! Now it’s time for refactoring :) — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#151 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASMSTWKF4ZRP5VAEGSSR3SIGDCNANCNFSM4R4IBLHQ> .

Georg Mayr-Duffner · Answer 8 · Tue Sep 29 2020 21:37:21 GMT+0800 (China Standard Time)

@fbennett somehow this doesn’t work, when a locator label is taking a part in the game. See #153

Frank Bennett · Answer 9 · Tue Sep 29 2020 21:44:23 GMT+0800 (China Standard Time)

Yes, I wasn't sure how that case should be handled. Will make the adjustment.

Frank Bennett · Answer 10 · Tue Sep 29 2020 21:47:52 GMT+0800 (China Standard Time)

The code currently doesn't distinguish between a prefix and a term. The effect of the two differs in this case, so I'll need to think a bit about how to get a better result, but it should be fixed tomorrow.

Georg Mayr-Duffner · Answer 11 · Tue Sep 29 2020 21:51:59 GMT+0800 (China Standard Time)

I don’t understand. If there’s a term, like with a prefix, the require test should be false. So, if they’re treated the same, shouldn’t it work already? Am Di., 29. Sept. 2020 um 15:48 Uhr schrieb Frank Bennett < notifications@github.com>:

…

The code currently doesn't distinguish between a prefix and a term. The effect of the two differs in this case, so I'll need to think a bit about how to get a better result, but it should be fixed tomorrow. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#151 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUH4QY7K4GC3UU5FLDJJ3SIHQRTANCNFSM4R4IBLHQ> .

Frank Bennett · Answer 12 · Wed Sep 30 2020 03:04:41 GMT+0800 (China Standard Time)

... yes, that's right. I was still struggling with the logic.

…

On Tuesday, September 29, 2020, Georg Mayr-Duffner ***@***.***> wrote: I don’t understand. If there’s a term, like with a prefix, the require test should be false. So, if they’re treated the same, shouldn’t it work already? Am Di., 29. Sept. 2020 um 15:48 Uhr schrieb Frank Bennett < ***@***.***>: > The code currently doesn't distinguish between a prefix and a term. The > effect of the two differs in this case, so I'll need to think a bit about > how to get a better result, but it should be fixed tomorrow. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#151# issuecomment-700715778>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ AABUH4QY7K4GC3UU5FLDJJ3SIHQRTANCNFSM4R4IBLHQ> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#151 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASMST62AFMN4USSTYHQATSIHRBBANCNFSM4R4IBLHQ> .

Georg Mayr-Duffner · Answer 13 · Wed Sep 30 2020 03:36:05 GMT+0800 (China Standard Time)

Ok. Do you need more information from me? Am Di., 29. Sept. 2020 um 21:04 Uhr schrieb Frank Bennett < notifications@github.com>:

…

... yes, that's right. I was still struggling with the logic. On Tuesday, September 29, 2020, Georg Mayr-Duffner < ***@***.***> wrote: > I don’t understand. If there’s a term, like with a prefix, the require test > should be false. So, if they’re treated the same, shouldn’t it work > already? > > Am Di., 29. Sept. 2020 um 15:48 Uhr schrieb Frank Bennett < > ***@***.***>: > > > The code currently doesn't distinguish between a prefix and a term. The > > effect of the two differs in this case, so I'll need to think a bit about > > how to get a better result, but it should be fixed tomorrow. > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub > > <#151# > issuecomment-700715778>, > > or unsubscribe > > <https://github.com/notifications/unsubscribe-auth/ > AABUH4QY7K4GC3UU5FLDJJ3SIHQRTANCNFSM4R4IBLHQ> > > . > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #151 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAASMST62AFMN4USSTYHQATSIHRBBANCNFSM4R4IBLHQ > > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#151 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUH4SKJNMKY3PJQ7MKOU3SIIVVTANCNFSM4R4IBLHQ> .

Frank Bennett · Answer 14 · Wed Sep 30 2020 05:21:06 GMT+0800 (China Standard Time)

It should be fixed now. citeproc-test-runner has been updated, and the revised beta should come up in a few minutes.

Georg Mayr-Duffner · Answer 15 · Wed Sep 30 2020 05:44:09 GMT+0800 (China Standard Time)

Thanks for your work! I’m really sorry to say that with dates it doesn’t work right yet. See #155

Edit: not only with dates. Still need to look into it further.

Georg Mayr-Duffner · Answer 16 · Wed Sep 30 2020 05:51:44 GMT+0800 (China Standard Time)

Ok, a number variable ending in a letter will trigger the wrong result as well, I updated the test accordingly.

Frank Bennett · Answer 17 · Wed Sep 30 2020 05:53:27 GMT+0800 (China Standard Time)

Thanks for the test, it helps a lot.

Georg Mayr-Duffner · Answer 18 · Wed Sep 30 2020 05:55:51 GMT+0800 (China Standard Time)

I think, slowly, I’m getting used to it :) But now, it’s bedtime for me. Am Di., 29. Sept. 2020 um 23:53 Uhr schrieb Frank Bennett < notifications@github.com>:

…

Thanks for the test, it helps a lot. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#151 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUH4QEGFL6QZKQQDE3GATSIJJOJANCNFSM4R4IBLHQ> .

Frank Bennett · Answer 19 · Wed Sep 30 2020 07:36:03 GMT+0800 (China Standard Time)

The client and test runner are now updated. Feels like we're getting close to a release with these various changes.

Georg Mayr-Duffner · Answer 20 · Wed Sep 30 2020 13:31:54 GMT+0800 (China Standard Time)

Thank you! Date variables with affixes are still not working correctly.

I’ll write a test a bit later.

Frank Bennett · Answer 21 · Wed Sep 30 2020 15:13:38 GMT+0800 (China Standard Time)

Okay. Possibly might be an issue with affixes, or (possibly?) detecting the cs:date output at the head of the comma-safe group, rather than immediately before? In any case, I'll look forward to the test. FB

…

On Wed, Sep 30, 2020 at 2:32 PM Georg Mayr-Duffner ***@***.***> wrote: Thank you! Date variables with affixes are still not working correctly. I’ll write a test a bit later. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#151 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASMSQIFKX5HEWKJRDPAKDSIK7FNANCNFSM4R4IBLHQ> .

Georg Mayr-Duffner · Answer 22 · Wed Sep 30 2020 15:34:31 GMT+0800 (China Standard Time)

Added two more tests.

Edit: the second test is for another issue I discovered: if a locator is mixed from numbers and letters with a letter in the beginning like in a123, the require="comma-safe-numbers-only" is falsely true if following an element ending in a number.

Frank Bennett · Answer 23 · Wed Sep 30 2020 18:09:41 GMT+0800 (China Standard Time)

Yay, thanks.

Georg Mayr-Duffner · Answer 24 · Wed Sep 30 2020 21:54:07 GMT+0800 (China Standard Time)

And another one: A group at the beginning of a citation should never test true for any require="comma-safe-*" test, what do you think?

See #158

Georg Mayr-Duffner · Answer 25 · Wed Sep 30 2020 22:12:08 GMT+0800 (China Standard Time)

A bit of context for the last one:

The jurism scaffolding places jurism-locator in the middle of other jurism-macros. However, in Europe it’s quite common to place the locator at the beginning of the citation for certain legal sources (mainly statutes and treaties).

In Austrian styles, a page number as pinpoint following the starting page number of the reference is to be set in parenthesis (cf. https://github.com/georgd/jm-style-tests/blob/jm-legcit/jm-leg-cit-ohne-verzeichnisse/chapFS_test011.txt). So, I test for the presence of the variable="page and of locator="page" and if both are true and require="comma-safe-numbers-only" is true, the locator is set in parenthesis. [This assumes that a locator page immediately follows a variable starting page, which will be fine with all style modules I checked.] (see https://github.com/georgd/jm-styles/blob/f89bb89ae2e5b815a95c881dc77356b2c1f98a93/jm-leg-cit-rechtsquellenverzeichnis-literaturverzeichnis.csl#L537-L576)

When applying this to non-Austrian statutes which bear the locator at the beginning of the citation but also contain a page number, with the current setup, the locator is put in parentheses because deemed comma-safe. (see https://github.com/georgd/jm-style-tests/blob/jm-legcit/jm-leg-cit-rechtsquellenverzeichnis-literaturverzeichnis/euLaw_test025.txt)

Georg Mayr-Duffner · Answer 26 · Wed Sep 30 2020 22:26:17 GMT+0800 (China Standard Time)

The client and test runner are now updated. Feels like we're getting close to a release with these various changes.

Btw, I think so, too. Please, could you merge Juris-M/legal-resource-registry#22 before doing the release? Thanks!

Georg Mayr-Duffner · Answer 27 · Thu Oct 01 2020 17:48:40 GMT+0800 (China Standard Time)

I’m really sorry. I unearthed another one (#159): When a date is part of a macro that gets the affix when being called, the test is wrong.

Frank Bennett · Answer 28 · Fri Oct 02 2020 05:47:53 GMT+0800 (China Standard Time)

All set. citeproc-test-runner has the revised processor, and the Jurism beta will update in a few minutes. (It wasn't examining suffixes set on a cs:text macro call, this was a good one to catch.)

Georg Mayr-Duffner · Answer 29 · Fri Oct 02 2020 06:42:38 GMT+0800 (China Standard Time)

The test is fine, the style test still fails. More investigation to come...

Georg Mayr-Duffner · Answer 30 · Fri Oct 02 2020 06:44:35 GMT+0800 (China Standard Time)

It’s nested text macros with date in parentheses that are still failing. New test soon...

Edit: test is #160

Frank Bennett · Answer 31 · Fri Oct 02 2020 06:53:06 GMT+0800 (China Standard Time)

Okaaay ... thanks for following up, more soon ...

Georg Mayr-Duffner · Answer 32 · Fri Oct 02 2020 07:09:54 GMT+0800 (China Standard Time)

Thank you very much for all your work!! Now bedtime for me.

Frank Bennett · Answer 33 · Fri Oct 02 2020 07:56:01 GMT+0800 (China Standard Time)

Updates are ready again. I think this should do it.

Georg Mayr-Duffner · Answer 34 · Fri Oct 02 2020 13:39:47 GMT+0800 (China Standard Time)

Yes!!!!! That did it! Thank you very very much for your efforts!

Georg Mayr-Duffner · Answer 35 · Thu Oct 08 2020 04:34:20 GMT+0800 (China Standard Time)

@fbennett I feel a bit bad for still not leaving this in peace. I found a case where I want to test for comma-safe-numbers-only twice in a row and found that the result of the first would proliferate to the second.

The test is here: #166

Frank Bennett · Answer 36 · Fri Oct 09 2020 19:38:23 GMT+0800 (China Standard Time)

Bug found and fixed. The problem was that cs:text prefix and content was being ignore in require/reject evaluation. e4eeb20 Also caught a lurking bug with 04854d3

Georg Mayr-Duffner · Answer 37 · Fri Oct 09 2020 21:47:43 GMT+0800 (China Standard Time)

Thank you! I’m still seeing it in my style tests after pulling this repo, running the cslrun from within the citeproc-js folder. It appears within the jurism macro, when gluing all the others together. Am I possibly doing sth. wrong like, should I try compiling the test-runner from source as well? I can’t reproduce it in a test fixture.

Frank Bennett · Answer 38 · Fri Oct 09 2020 22:41:57 GMT+0800 (China Standard Time)

Sorry, it's been tested only in the source repo. I've updated the test runner, refreshing your copy should yield better test results now. Client update now in progress ...

Frank Bennett · Answer 39 · Fri Oct 09 2020 23:03:15 GMT+0800 (China Standard Time)

Before I refresh the client, I'll take a look with your latest modules and styles.

Frank Bennett · Answer 40 · Fri Oct 09 2020 23:17:19 GMT+0800 (China Standard Time)

Which test is still failing for you? The two tests mentioned earlier in the thread are passing. I'm getting 21 failures with jm-leg-cit-ohne-verzeichnisse.csl, and 43 failures with jm-leg-cit-rechtsquellenverzeichnis-literaturverzeichnis.csl. I'm not sure which of those manifests the error you're referencing.

Georg Mayr-Duffner · Answer 41 · Sat Oct 10 2020 00:12:17 GMT+0800 (China Standard Time)

It‘s for example, atCaseCommenter_test009 with jm-leg-cit-rechtsquellenverzeichnis-literaturverzeichnis.csl — in the bibliography.

On my system, most of the failures are from the test-runner not picking up the language variants from the abbrevs files. But that doesn’t seem to be the case no your side with only 21 for the other style?

Frank Bennett · Answer 42 · Sat Oct 10 2020 07:06:53 GMT+0800 (China Standard Time)

Abbrevs seem to be supplied only from the test runner's internal copy of the abbreviation data, so there are two issues for us to work through:

Is your copy of citeproc-test-runner installed/updated as a GitHub repo clone, or via npm install/update citeproc-test-runner? We should make sure that your running copy is up to date.
It will be preferable for your testing purposes to make the abbrevs source configurable via cslrun.yaml, so I'll work on that.

In addition, I think the test runner is probably not locale-aware when selecting the abbreviation source, so I'll need to work on that as well.

Frank Bennett · Answer 43 · Sat Oct 10 2020 07:52:37 GMT+0800 (China Standard Time)

A bit of inspection, and a few things to add.

The abbreviation source directory can be set from the command line with an -A= or --abbreviations`` option;
The abbrevs source directory cannot (yet) be configured in cslrun.yaml; and
Abbreviations are drawn only from the primary named file, ignoring extensions.

The third item is the first that should be fixed at my end. Willl post when it's ready for update in the test runner.

Frank Bennett · Answer 44 · Sat Oct 10 2020 09:31:35 GMT+0800 (China Standard Time)

We have an issue to resolve in the logic around jurisdiction-preferences. In the current setup, it performs two overlapping roles. It sets the preferred style-domain variant (i.e. LegCit or IndigoBook), and also the language-domain variant (i.e. fr or de or de-AT), and the combination is confusing. (Mashing the two together is not great design, but it may not be awful enough to push us to make deep changes.)

Style modules are cast for a style domain. So if a parent style prefers LegCit modules, the processor will try to find it, and fall back to the default if it is not available. Style modules are indifferent to language, so the language-domain elements in jurisdiction-preferences will not have a matching module, and will be ignored.

It's less clear how selection of an abbreviation set should work. It could be driven by:

The language domain of the style;
The language domain of the item being rendered;
An arbitrary name (possibly a language code) set in jurisdiction-preferences.

I'm thinking that the abbrevs extension used should be the first to match among:

An extension listed in jurisdiction-preferences;
The item language;
The language of the style;
The default abbreviation set (as fallback).

What do you think?

Georg Mayr-Duffner · Answer 45 · Sat Oct 10 2020 19:13:51 GMT+0800 (China Standard Time)

Is your copy of citeproc-test-runner installed/updated as a GitHub repo clone, or via npm install/update citeproc-test-runner? We should make sure that your running copy is up to date.

Sorry for the long wait. I’m using the npm version of the citeproc-test-runner.

It will be preferable for your testing purposes to make the abbrevs source configurable via cslrun.yaml, so I'll work on that.

In my case where I also work on lrr-data, this indeed would be preferable. The new abbrevs go into the Zotero directories, so I’d like to point the abbrevs source to that location.

Frank Bennett · Answer 46 · Sat Oct 10 2020 19:50:37 GMT+0800 (China Standard Time)

More inspection, more learning here. Although the auto-* abbrev files are nicely laid out with optional domain extensions (for lack of a better word) following the country name, the system is designed to set just one, monolingual per-style abbreviation list, using the first match (scanning left-to-right) among the domains listed in the argument to jurisdiction-preference, set on the cs:style-options element under a cs:locale node. Once set, the abbreviation list is sticky to the style: it will not be overwritten unless there is a version increment in the source file, or the user manually adjusts an entry through the Jurism UI.

This will work okay if the abbreviations to be applied are uniform across all items rendered by the style. Unfortunately, I don't think that's going to be good enough. Some styles are not bound to a specific language. So if (for example) a user renders a document with Chicago Author-Date in English, the English abbreviations will be bound to its ID. If the same user later renders a document with the same style set to the French locale, the English abbreviations will still be used. That's definitely going to be a problem.

A second case might arise, although I'm hazy on whether we have a current need to address it. If a style intends abbreviations to follow item language (when specified), that would be a second use case requiring language-sensitivity within a given style. Do any current styles expect that behavior?

Input is very welcome on this. There will need to be a change to the database, which may be somewhat disruptive when it reaches clients in the field, and I'd like to get things right on the first attempt.

CC: @Droitslinguistiques, @georgd

Georg Mayr-Duffner · Answer 47 · Sat Oct 10 2020 19:58:24 GMT+0800 (China Standard Time)

We have an issue to resolve in the logic around jurisdiction-preferences. In the current setup, it performs two overlapping roles. It sets the preferred style-domain variant (i.e. LegCit or IndigoBook), and also the language-domain variant (i.e. fr or de or de-AT), and the combination is confusing. (Mashing the two together is not great design, but it may not be awful enough to push us to make deep changes.)

Style modules are cast for a style domain. So if a parent style prefers LegCit modules, the processor will try to find it, and fall back to the default if it is not available. Style modules are indifferent to language, so the language-domain elements in jurisdiction-preferences will not have a matching module, and will be ignored.

I mostly agree with you on that. One thing I discovered was the necessity to be very careful with naming style-module-extensions and abbreviation variants. I’d not name a style-module-extension de as this might clash with the language from abbrevs. However, I did name the jurism-un.int-deAT.csl with something that looks very much like a language code — I assumed that this module will be fine for all Austrian styles.

The statement that style modules are indifferent to language is not entirely true. Ideally they were, but e.g. the juris-eu.int.csl file definitely is a clearly English language thing which won’t be overcome before custom terms come true.

It's less clear how selection of an abbreviation set should work. It could be driven by:

The language domain of the style;

The language domain of the item being rendered;

An arbitrary name (possibly a language code) set in jurisdiction-preferences.

~~By “language domain of the item being rendered” you mean the primary language associated with the item’s jurisdiction in the abbrevs file?~~ [No, you didn’t, I understood that when thinking about your proposal below.]

I'm thinking that the abbrevs extension used should be the first to match among:

An extension listed in jurisdiction-preferences;

The item language;

The language of the style;

The default abbreviation set (as fallback).

What do you think?

I was having issues logically resolving language preference collisions in multilingual jurisdictions. Bringing the item language into the game, putting it between the jurisdiction-preferences list and the language of the style, magically solves this issue. I currently have the language of the style as last item in the preferences list which definitely leads to unintended consequences. So, yes, I think this is a good idea.

One thing to consider, I think, is to set up a naming policy for abbreviation variants. I think, I already mentioned it in another thread: in order to prevent unintended use of alternative abbreviations, only the official languages of a jurisdiction should be named with the ISO language codes in the abbreviation files. So, e.g. there should be de, fr, it and rm variants in the juris-ch-desc.json but no en as English is none of the official languages in Switzerland. A style that requires English abbreviations for Swiss institutions should add a non-ISO-code like e.g. enIBFD. (This is currently not an issue as it’s exactly this way, but I think it should be stated explicitly somewhere).

Georg Mayr-Duffner · Answer 48 · Sat Oct 10 2020 20:05:55 GMT+0800 (China Standard Time)

I’ll come back on this later. Just one remark:

... using the first match (scanning left-to-right) among the domains listed in the argument to jurisdiction-preference ...

Are you sure about left-to-right scanning? When I tested this with style-module extensions, I discovered that I had to put the preferred domain right-most. Is this different for language preference?

Frank Bennett · Answer 49 · Sat Oct 10 2020 20:17:39 GMT+0800 (China Standard Time)

Ha. I hadn't checked, and thought to myself, "Um, wonder if that's right?" Apparently I was wrong. Right-to-left. :-/

Georg Mayr-Duffner · Answer 50 · Sun Oct 11 2020 03:02:10 GMT+0800 (China Standard Time)

Ha. I hadn't checked, and thought to myself, "Um, wonder if that's right?" Apparently I was wrong. Right-to-left. :-/

I was prepared for checking. When you told me about that attribute, you explicitly said not not to remember in which direction they were evaluated.

Georg Mayr-Duffner · Answer 51 · Sun Oct 11 2020 03:44:50 GMT+0800 (China Standard Time)

More inspection, more learning here. Although the auto-* abbrev files are nicely laid out with optional domain extensions (for lack of a better word) following the country name, the system is designed to set just one, monolingual per-style abbreviation list, using the first match (scanning left-to-right) among the domains listed in the argument to jurisdiction-preference, set on the cs:style-options element under a cs:locale node. Once set, the abbreviation list is sticky to the style: it will not be overwritten unless there is a version increment in the source file, or the user manually adjusts an entry through the Jurism UI.

I’m not sure if I understand correctly: So, currently, if a style lists a language domain extension in its in jurisdiction-preference, that language domain (the first seen by the processor) is bound to the style. A second language domain in that attribute is ignored. I think, that looks logical. But I’m not sure about the version increment in the source file. How is that detected? And the manual adjustment of an entry isn’t really modifying the relation between style and language domain but only changes the output?

This will work okay if the abbreviations to be applied are uniform across all items rendered by the style. Unfortunately, I don't think that's going to be good enough. Some styles are not bound to a specific language. So if (for example) a user renders a document with Chicago Author-Date in English, the English abbreviations will be bound to its ID. If the same user later renders a document with the same style set to the French locale, the English abbreviations will still be used. That's definitely going to be a problem.

I‘m curious: is this already a thing? Does using Chicago Author-Date in a certain language already bind the style to that language? Anyways, agreed, that's really a problem.

A second case might arise, although I'm hazy on whether we have a current need to address it. If a style intends abbreviations to follow item language (when specified), that would be a second use case requiring language-sensitivity within a given style. Do any current styles expect that behavior?

IMO, in juridic styles, this wouldn’t be uncommon when it comes to citations from multilingual jurisdictions:

Example 1: Cases at the ICJ might be cited in French or English and the abbreviation should get chosen accordingly. I’d say, this should be consistently in one language throughout a document but another document might choose the other variant.
Example 2: Belgium has three official languages, French, Dutch and German (the latter with much minor importance), so abbreviations for all three would exist. When I cite a Belgian case in a document that uses an Austrian style, I’ll most probably use either Dutch or French but not German. Given the special constellation in Belgium, the preferred abbreviation might be determined by the case itself. So, this can‘t be done in the jurisdiction-preference attribute and shouldn’t rely on the style or document locale. A Belgian style on the other hand might probably set a preferred locale.

Input is very welcome on this. There will need to be a change to the database, which may be somewhat disruptive when it reaches clients in the field, and I'd like to get things right on the first attempt.

How disruptive? Would entries have to be touched manually? Or do you expect any other necessary intervention by the user?

Georg Mayr-Duffner · Answer 52 · Sun Oct 11 2020 04:00:44 GMT+0800 (China Standard Time)

The abbreviation source directory can be set from the command line with an -A= or --abbreviations`` option;

If anyone else tries this: there’s no = after -A or --abbreviations

Abbreviations are drawn only from the primary named file, ignoring extensions.

That’s what I experience :)

Georg Mayr-Duffner · Answer 53 · Sun Oct 11 2020 07:17:04 GMT+0800 (China Standard Time)

@fbennett do you think you could do a release before starting work on the variant issue, containing the leg-cit and IFBD styles, style modules and abbreviations? I'll create the pull requests later today. There's a bunch of scholars, eager to get acquainted with Juris-M :)

Frank Bennett · Answer 54 · Sun Oct 11 2020 08:03:56 GMT+0800 (China Standard Time)

More inspection, more learning here. Although the auto-* abbrev files are nicely laid out with optional domain extensions (for lack of a better word) following the country name, the system is designed to set just one, monolingual per-style abbreviation list, using the first match (scanning left-to-right) among the domains listed in the argument to jurisdiction-preference, set on the cs:style-options element under a cs:locale node. Once set, the abbreviation list is sticky to the style: it will not be overwritten unless there is a version increment in the source file, or the user manually adjusts an entry through the Jurism UI.

I’m not sure if I understand correctly: So, currently, if a style lists a language domain extension in its in jurisdiction-preference, that language domain (the first seen by the processor) is bound to the style. A second language domain in that attribute is ignored. I think, that looks logical. But I’m not sure about the version increment in the source file. How is that detected? And the manual adjustment of an entry isn’t really modifying the relation between style and language domain but only changes the output?

Abbrev source file versions are bumped automatically by the script that compiles them from desc. The abbrevs plugin in the client sets the version in a database table, and checks the declared version in the source against the DB version of each at startup. If there is a discrepancy, it wipes out all abbrevs for the country and reinstalls them.

You're right about manual adjustment: it would just change the output for the court (or whatever) everywhere and always when the style is used.

This will work okay if the abbreviations to be applied are uniform across all items rendered by the style. Unfortunately, I don't think that's going to be good enough. Some styles are not bound to a specific language. So if (for example) a user renders a document with Chicago Author-Date in English, the English abbreviations will be bound to its ID. If the same user later renders a document with the same style set to the French locale, the English abbreviations will still be used. That's definitely going to be a problem.

I‘m curious: is this already a thing? Does using Chicago Author-Date in a certain language already bind the style to that language? Anyways, agreed, that's really a problem.

This is just my conclusion from looking at the code, but I'm pretty sure that's what would happen.

A second case might arise, although I'm hazy on whether we have a current need to address it. If a style intends abbreviations to follow item language (when specified), that would be a second use case requiring language-sensitivity within a given style. Do any current styles expect that behavior?

IMO, in juridic styles, this wouldn’t be uncommon when it comes to citations from multilingual jurisdictions:

This is good to know.

* Example 1: Cases at the ICJ might be cited in French or English and the abbreviation should get chosen accordingly. I’d say, this should be consistently in one language throughout a document but another document might choose the other variant.

* Example 2: Belgium has three official languages, French, Dutch and German (the latter with much minor importance), so abbreviations for all three would exist. When I cite a Belgian case in a document that uses an Austrian style, I’ll most probably use either Dutch or French but not German. Given the special constellation in Belgium, the preferred abbreviation might be determined by the case itself. So, this can‘t be done in the `jurisdiction-preference` attribute and shouldn’t rely on the style or document locale. A Belgian style on the other hand might probably set a preferred locale.

Input is very welcome on this. There will need to be a change to the database, which may be somewhat disruptive when it reaches clients in the field, and I'd like to get things right on the first attempt.

How disruptive? Would entries have to be touched manually? Or do you expect any other necessary intervention by the user?

In previous upgrades to the abbrevs filter that touched the DB tables, I've scrubbed everything, so users had to start over. Now that the client is seeing wider use, that would be unpleasant. I was thinking that adding a column to track language would require cloning and rewriting all of the user's existing abbreviations---possible, but a heavy and complex operation with a risk of missteps.

It turns out, though, that we've been lucky. The schema of the abbreviations table maintained by the plugin does not enforce uniqueness (which was careless), but the entries are currently unique because the code that writes into it protects against duplicate entries. With that as a starting point, SQLite has a command for adding column to a table non-destructively. So we can add a domain column to hold the label for alternatives, and use that to discriminate between abbreviation sets when calling the DB.

That solves the fundamental problem. The remaining work will be in adjusting code in the client and the plugin to take advantage of the new capability. There will be some jiggery-pokery to work around the fact that the core code for resolving abbrevs in citeproc-js is (a) not domain-aware and (b) relied upon by Zotero among others. But the work there shouldn't be too bad. It will take awhile to get this fixed, but the path forward seems clear.

Assuming that on-the-fly selection of abbreviation sets can be implemented, we should settle how it is to be controlled. In the current setup, selections are driven exclusively by the jurisdiction-preference attribute to cs:style-options under cs:locale. The list argument to that attribute controls both selection of style modules and selection of abbreviation sets. The two are not necessarily mutually dependent, and I'm a little worried that combining the two will cause confusion. It seems there may be a case for a separate abbrev-preference attribute, for clarity. Or maybe it's okay the way it is. I'm torn.

About inheritance ... I think that we'll be okay, for inheritance of both abbrev and module settings, looking first to cs:style-options on the immediate locale of the item for a match (if any), then to the global locale (if any), then to the default.

Frank Bennett · Answer 55 · Sun Oct 11 2020 08:07:33 GMT+0800 (China Standard Time)

@fbennett do you think you could do a release before starting work on the variant issue, containing the leg-cit and IFBD styles, style modules and abbreviations? I'll create the pull requests later today. There's a bunch of scholars, eager to get acquainted with Juris-M :)

Can do. It might be worth considering whether some of your modules and abbreviation sets might be promoted to the default, since the terrain is pretty sparsely populated

Frank Bennett · Answer 56 · Sun Oct 11 2020 11:12:00 GMT+0800 (China Standard Time)

Um ... from the internal monologue on this ... I was mistaken about being able to just add a column. It seems that freshly installed instances of Jurism do set a UNIQUE constraint on the abbreviations table that will need to be modified. No worries, though, I've set up and tested the operations for recreating the table with a newly created schema. The updated table will work fine with the existing functions, so I'll bundle the update with our next release, and then work on the improved abbrev-selection feature in the beta as time permits.

Samuel Gagnon · Answer 57 · Mon Oct 12 2020 00:12:50 GMT+0800 (China Standard Time)

Hello!

I appreciate you cc'ing be to this topic. However, I have read through this whole thread, and I'm afraid that I am completely out of my depth on most of these things.

I've mostly been able to code through trial and error so far, and I'm definitely missing a lot of background knowledge needed to understand the issues are.

However, I really do want to help. I'm just not sure I understand enough to give any kind of useful feedback.

Georg Mayr-Duffner · Answer 58 · Mon Oct 12 2020 03:11:59 GMT+0800 (China Standard Time)

@Droitslinguistiques the whole thread is treating various things. The relevant portion starts at #151 (comment).

The question is about the logic behind choosing the language variant from the abbreviation files:

It's less clear how selection of an abbreviation set should work. It could be driven by:

The language domain of the style;

The language domain of the item being rendered;

An arbitrary name (possibly a language code) set in jurisdiction-preferences.

I'm thinking that the abbrevs extension used should be the first to match among:

An extension listed in jurisdiction-preferences;

This means, if an abbreviation for that extension exists, it will be applied unconditionally. If you list "fr" in your Canadian style in the jurisdiction-preferences, French abbrevs will always be applied, if they exist.

The item language;

The user can set the language of the abbrev by setting a specific language in every single item.

The language of the style;

If the style itself is bound to a language, this language variant is the next eligible from the abbrevs.

The default abbreviation set (as fallback).

How do you think, this selection should be made? When you cite Canadian sources, would you always select the French abbreviations unconditionally? Also, even if less common, how would you want to see foreign sources cited?

Georg Mayr-Duffner · Answer 59 · Mon Oct 12 2020 03:44:25 GMT+0800 (China Standard Time)

Can do. It might be worth considering whether some of your modules and abbreviation sets might be promoted to the default, since the terrain is pretty sparsely populated

I considered it.

The IBFD sources for the abbreviations are not complete but they could serve as a start. I only added IBFD variants where they add an English translation so far.
The leg-cit style modules are very specific. The next style will very probably result in more general modules.

Samuel Gagnon · Answer 60 · Mon Oct 12 2020 04:19:39 GMT+0800 (China Standard Time)

Thank you, that makes much more sense to me.

To answer your question, the way that the McGill guide generally handles languages is to check if the source is available in the author's language, and if not, than you revert to the original source language.

So for exemple, if I'm quoting a case from a bilingual jurisdiction, available in both languages, then I'd always use the french citation and abbrevs, regardless of whether I was quoting the text of the english version or not. However, if the jurisdiction isn't bilingual, then I have to use the english abbrevs. To complicate things, if I'm writting my text in french, than the structure of the citation itself has to follow the french rules, even if I'm using all-english abbrevs when quoting an english-only case.

The McGill guide does have rules that specify when to use french over english and vice-versa. However, my copy is in my office at the university, and I can't retrieve it until Tuesday.

The problem I'm seeing is particular to the canadian context : Only some jurisdiction have two official languages. Most provinces are unilingual english. As such, the french abbrevs should never be used to refer to a court in a unilingually english province.

However, instead of having that determined by the style, I've just submitted a new set of abbreviations to the LRD, which hard-codes english-only abbrevs for the english provinces. I made it so someone writing using the default abbreviations would have english abbreviations everywhere, and someone writing in french would have french abbreviations in bilingual jurisdictions, but english ones everywhere else. There were a few really weird edge cases, but I managed to address every scenario I could think of.

However, that was only possible because of the specificity of Canada's language laws. I don't think that this is an idea that could be exported elsewhere.

As a final thought, I definitely think that having the language of the item low in the priority list is a good idea. Specifying the language for every single item seems very tedious, so I like the idea of it being an exception rather than the rule.

Frank Bennett · Answer 61 · Mon Oct 12 2020 07:44:24 GMT+0800 (China Standard Time)

@fbennett do you think you could do a release before starting work on the variant issue, containing the leg-cit and IFBD styles, style modules and abbreviations? I'll create the pull requests later today. There's a bunch of scholars, eager to get acquainted with Juris-M :)

Can do. It might be worth considering whether some of your modules and abbreviation sets might be promoted to the default, since the terrain is pretty sparsely populated

Understood about defaults. The more I look at this, though, the more I find myself thinking that we should get abbreviation arbitration fixed before making a release. A lot of thorny work is needed for it in the Abbrevs Filter, but for any multilingual work citations are often going to come out wrong until it's fixed. Knowing the sensitivity of lawyers to accuracy, I would worry that users might drop the tool and not come back if it doesn't produce correct output when they first pick it up.

Georg Mayr-Duffner · Answer 62 · Mon Oct 12 2020 08:08:43 GMT+0800 (China Standard Time)

In this case, the people I'm talking about will work with the IBFD style which is doing fine in this regard. And I think, postponing the introduction there might create more inconvenience.

From my side, it's all set, so far and the PRs are done. :)

Frank Bennett · Answer 63 · Mon Oct 12 2020 08:32:54 GMT+0800 (China Standard Time)

Okay. Let's do one more beta so you can take a look at the finished product. If that looks good, let's pull the cord on a release!

Frank Bennett · Answer 64 · Mon Oct 12 2020 10:25:13 GMT+0800 (China Standard Time)

The beta update is out. It's passed the client tests, but I haven't tried it with the new styles. If there are problems, just give a shout.

Georg Mayr-Duffner · Answer 65 · Mon Oct 12 2020 21:31:25 GMT+0800 (China Standard Time)

@fbennett the beta works well for IBFD. I found an issue with leg-cit and statutes disambiguation which will need my attention. But that shouldn’t stop the release.

As the original issue of this thread seems really solved, btw (yeah, thanks again for the great work!), would you mind moving the discussion down from #151 (comment) to a new issue?

Frank Bennett · Answer 66 · Tue Oct 13 2020 01:26:16 GMT+0800 (China Standard Time)

Okay, let's do this! More soon.

Frank Bennett · Answer 67 · Tue Oct 13 2020 04:12:25 GMT+0800 (China Standard Time)

The release is up, for Linux, Windows, and Mac. On Linux and Windows, the version is 5.0.90m4. On the Mac it is 5.0.90m5. The discrepancy arose b/c I forgot to pull some changes to the Abbrevs plugin on the first Mac build. The code is identical for the m4 and m5 versions.

Will open a new issue for the domain-arbitration issues around abbrevs.

Georg Mayr-Duffner · Answer 68 · Tue Oct 13 2020 04:46:00 GMT+0800 (China Standard Time)

Thank you very much!

Georg Mayr-Duffner · Answer 69 · Tue Oct 13 2020 04:59:58 GMT+0800 (China Standard Time)

More inspection, more learning here. Although the auto-* abbrev files are nicely laid out with optional domain extensions (for lack of a better word) following the country name, the system is designed to set just one, monolingual per-style abbreviation list, using the first match (scanning left-to-right) among the domains listed in the argument to jurisdiction-preference, set on the cs:style-options element under a cs:locale node. Once set, the abbreviation list is sticky to the style: it will not be overwritten unless there is a version increment in the source file, or the user manually adjusts an entry through the Jurism UI.

I’m not sure if I understand correctly: So, currently, if a style lists a language domain extension in its in jurisdiction-preference, that language domain (the first seen by the processor) is bound to the style. A second language domain in that attribute is ignored. I think, that looks logical. But I’m not sure about the version increment in the source file. How is that detected? And the manual adjustment of an entry isn’t really modifying the relation between style and language domain but only changes the output?

This will work okay if the abbreviations to be applied are uniform across all items rendered by the style. Unfortunately, I don't think that's going to be good enough. Some styles are not bound to a specific language. So if (for example) a user renders a document with Chicago Author-Date in English, the English abbreviations will be bound to its ID. If the same user later renders a document with the same style set to the French locale, the English abbreviations will still be used. That's definitely going to be a problem.

Sorry for continuing here. but I’m observing some behaviour that makes me wonder if I understood correctly what you explained: What I read from your explanation is, that only the first abbrev extension is applied per style. However, the IBFD style lists enIBFD and englished (which fits well so I didn’t have to add another variant to juris-jp-desc) — and both get applied as expected. At least the jurisdiction-preference part seems to work well already?

Frank Bennett · Answer 70 · Tue Oct 13 2020 05:32:22 GMT+0800 (China Standard Time)

The problem would arise only with a style that does not set default-locale, where style-options with jurisdiction-preference is set on multiple locales with different values. The LegCit styles won't be affected by it.

Frank Bennett · Answer 71 · Wed Oct 14 2020 19:26:12 GMT+0800 (China Standard Time)

I think I have working code for locale arbitration of abbreviation sets. Coming soon after some code cleanup.