Juris-M / citeproc-js

A JavaScript implementation of the Citation Style Language (CSL) https://citeproc-js.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

APA-style subtitle capitalization shouldn't apply to words with internal capitalization

dstillman opened this issue · comments

For instance, this paper (https://pubmed.ncbi.nlm.nih.gov/30217542/) is rendered as "A face is more than just the eyes, nose, and mouth: FMRI evidence that face-selective cortex represents external features." It should be "fMRI."

zotero/zotero#1958

Oh, so it seems like this is just a general problem of title casing not respecting internal capitalization, not a problem with subtitle capitalization specifically. I can reproduce this with Chicago style.

I can fix this, but I'm somewhat surprised by it — I would think we'd be getting reports all the time of things with internal capitalization (e.g., "iPhone") getting incorrectly capitalized in title-case styles. I wonder if some styles do actually call for naive title-casing, regardless of internal capitalization.

https://whoo.ps/2013/10/23/how-do-you-capitalize-iphone-or-ipad quotes the Chicago guide on this:

Brand names or names of companies that are spelled with a lowercase initial letter followed by a capital letter (eBay, iPod, iPhone, etc.) need not be capitalized at the beginning of a sentence or heading, though some editors may prefer to reword.

What you're saying is correct @dstillman and should be fixed in the CSL spec

Sorry, I don't know how to build citeproc-js or write tests, but here's a patch that seems to fix this problem. It checks to see if a word is all lowercase (i.e., no existing capitalization), and only then does it capitalize.

If someone who knows how to do the above is able to test and merge this, I'd appreciate it.

diff --git a/src/formatters.js b/src/formatters.js
index af7350ab..fdd74727 100644
--- a/src/formatters.js
+++ b/src/formatters.js
@@ -286,23 +286,29 @@ CSL.Output.Formatters = (function () {
                     var words = wordle.strings;
                     for (var j=0,jlen=words.length;j<jlen;j++) {
                         var word = words[j];
                         if (!word) {
                             continue;
                         }
-                        if (word.length > 1 && !CSL.toLocaleLowerCase.call(state, word).match(config.skipWordsRex)) {
+                        let lcase = CSL.toLocaleLowerCase.call(state, word);
+                        let capitalize = false;
+                        if (word.length > 1 && !lcase.match(config.skipWordsRex)) {
                             // Capitalize every word that is not a stop-word
-                            words[j] = _capitalise.call(state, words[j]);
+                            capitalize = true;
                         } else if (j === (words.length - 1) && followingTag === "-") {
-                            words[j] = _capitalise.call(state, words[j]);
+                            capitalize = true;
                         } else if (config.isFirst) {
                             // Capitalize first word, even if a stop-word
-                            words[j] = _capitalise.call(state, words[j]);
+                            capitalize = true;
                         } else if (config.afterPunct) {
                             // Capitalize after punctuation
-                            words[j] = _capitalise.call(state, words[j]);
+                            capitalize = true;
+                        }
+                        // Don't capitalize if word already contains capitalization
+                        if (capitalize && word === lcase) {
+                            words[j] = _capitalise.call(state, word);
                         }
                         config.afterPunct = false;
                         config.isFirst = false;
                         config.lastWordPos = {
                             strings: i,
                             words: j
@@ -331,13 +337,16 @@ CSL.Output.Formatters = (function () {
             capitaliseWords: function(str) {
                 var words = str.split(" ");
                 for (var i=0,ilen=words.length;i<ilen;i++) {
                     var word = words[i];
                     if (word) {
                         if (config.isFirst) {
-                            words[i] = _capitalise.call(state, word);
+                            // Don't capitalize if word already contains capitalization
+                            if (word == CSL.toLocaleLowerCase.call(state, word)) {
+                                words[i] = _capitalise.call(state, word);
+                            }
                             config.isFirst = false;
                             break;
                         }
                     }
                 }
                 return words.join(" ");
@@ -361,13 +370,16 @@ CSL.Output.Formatters = (function () {
             quoteState: [],
             capitaliseWords: function(str) {
                 var words = str.split(" ");
                 for (var i=0,ilen=words.length;i<ilen;i++) {
                     var word = words[i];
                     if (word) {
-                        words[i] = _capitalise.call(state, word);
+                        // Don't capitalize if word already contains capitalization
+                        if (word == CSL.toLocaleLowerCase.call(state, word)) {
+                            words[i] = _capitalise.call(state, word);
+                        }
                     }
                 }
                 return words.join(" ");
             },
             skipWordsRex: null,
             tagState: [],

OK, I figured out how to build citeproc-js and run tests. Currently failing on two tests:

  1508 passing (11s)
  2 failing

  1) Integration tests should pass bugreports_SelfLink:

      AssertionError: /Users/dan/citeproc-js/fixtures/std/processor-tests/humans/bugreports_SelfLink.txt
      + expected - actual

      -[o A. <i>Book Title</i>]
      +[O A. <i>Book Title</i>]

      at Context.<anonymous> (.cslTestFixtures/fixtures.js:87240:24)
      at processImmediate (internal/timers.js:461:21)

  2) Integration tests should pass bugreports_SingleQuoteXml:

      AssertionError: /Users/dan/citeproc-js/fixtures/std/processor-tests/humans/bugreports_SingleQuoteXml.txt
      + expected - actual

      -[Cite with a composer ; o A. "hello" <i>Book Title</i>]
      +[Cite with a composer ; O A. "hello" <i>Book Title</i>]

      at Context.<anonymous> (.cslTestFixtures/fixtures.js:87240:24)
      at processImmediate (internal/timers.js:461:21)

Looking into how I caused those.

Those errors are caused by the patch to capitaliseWords(), but I'm not understanding that at all — if my patch is conditionally not capitalizing when the string isn't all lowercase, when before it was always capitalizing, I don't understand how that's causing a lowercase "o" to be capitalized when it wasn't before. That seems…backwards?

I'm going to have to set this aside for the moment, so if someone else wants to take a crack at it, go for it.

@dstillman You are reading the output backwards. After your patch, the o is not being capitalized, but it should be capitalized.

Oh, I was, but someone should fix that patch output —  it lists + expected - actual above and then displays actual before expected below. At the very least the order should be the same, but I would say - should be expected and + should be actual.

Someone did not read carefully, and has now reverted their change.
Juris-M/citeproc-test-runner@679a049
The output is directly from Mocha. If it's a problem, it's an issue for upstream.

I've added a fix for the capitalization issue. aa2683f
In the two test failures flagged, the target of the textcase operation was a term containing a non-breaking space. This was not recognized as a split-point, so the term was treated as a single word containing uppercase characters and ignored. In addition to the fix for capitalization of cute product names etc etc, the commit invokes markup splitting in the capitalize-all and capitalize-first functions, rather than simply splitting on a space character, which was a bug report waiting to happen.

Closing this. Feel free to reopen at will if it's still broken and messed up (in addition to being ugly).