Juris-M / citeproc-js

A JavaScript implementation of the Citation Style Language (CSL) https://citeproc-js.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Q: Re-use an engine

larsgw opened this issue · comments

On the CSL forum @fbennett mentioned that it's best to re-use citeproc engines when possible. However, this becomes a problem when someone tries to format different entries with the same ID (citation-js/citation-js#82). I found a fix for this where I can replace the registry (citation-js/citation-js@efa648b). However, this breaks any form of sequential formatting, trying to get disambiguation info from a now-empty registry (citation-js/citation-js#101). Is it better to just cache the parsed XML or is there a better way?

Just for reference, adding this seems to work but I am unsure whether that does cover all problems:

engine.disambiguate = new CSL.Disambiguation(engine);

Not sure I understand the case fuily, but does it work to clear the registry by invoking citeproc.updateItems([]) (with an empty array)? (https://citeproc-js.readthedocs.io/en/latest/running.html#updateitems)

This code reproduces the error (https://runkit.com/larsgw/citeproc-js-caching):

const CSL = require('citeproc')
const fetch = require('node-fetch')

const style = await fetch(
    'https://cdn.jsdelivr.net/gh/citation-style-language/styles@master/apa.csl'
  ).then(response => response.text())
const locale = await fetch(
    'https://cdn.jsdelivr.net/gh/citation-style-language/locales@master/locales-en-US.xml'
  ).then(response => response.text())

function formatBibliography ([data, entries]) {
  console.log(data.bibstart + entries.join('') + data.bibend)
}

const data = {}

const engine = new CSL.Engine({
  retrieveLocale () { return locale },
  retrieveItem (id) { return data[id] }
}, style, 'en-US')

// If you change an item while keeping the same identifier, it does not reset:
data.a = { id: 'a', title: 'a', author: [{ family: 'a' }] }
engine.updateItems(Object.keys(data))
formatBibliography(engine.makeBibliography())
// "a. (n.d.). a."

// Change the item
data.a = { id: 'a', title: 'b', author: [{ family: 'a' }] }
engine.updateItems(Object.keys(data))
formatBibliography(engine.makeBibliography())
// "a. (n.d.). a."

// Resetting the engine by making a new registry does work...
engine.registry = new CSL.Registry(engine)
engine.updateItems(Object.keys(data))
formatBibliography(engine.makeBibliography())
// "a. (n.d.). b."

// ...but when I remove an item and add a new item with the same author, it breaks:
delete data.a
data.c = { id: 'c', title: 'c', author: [{ family: 'a' }] }
engine.registry = new CSL.Registry(engine)
engine.updateItems(Object.keys(data))
formatBibliography(engine.makeBibliography())
// TypeError: Cannot read property 'id' of undefined

Surprising that this hasn't come up before. The registry doesn't have an efficient way of detecting changes in data behind an ID, and we don't have an update function to nudge a single item. If you know which items have changed, though, you can accomplish the same result with code like the following:

const CSL = require('citeproc')
const fetch = require('node-fetch')

const style = await fetch(
    'https://cdn.jsdelivr.net/gh/citation-style-language/styles@master/apa.csl'
).then(response => response.text())
const locale = await fetch(
    'https://cdn.jsdelivr.net/gh/citation-style-language/locales@master/locales-en-US.xml'
).then(response => response.text())

function formatBibliography ([data, entries]) {
    console.log(data.bibstart + entries.join('') + data.bibend)
}

const data = {}

const engine = new CSL.Engine({
    retrieveLocale () { return locale },
    retrieveItem (id) { return data[id] }
}, style, 'en-US')

// If you change an item while keeping the same identifier, it does not reset:
data.a = { id: 'a', title: 'a', author: [{ family: 'a' }] }
engine.updateItems(Object.keys(data))
formatBibliography(engine.makeBibliography())
// "a. (n.d.). a."

// Change the item
data.a = { id: 'a', title: 'b', author: [{ family: 'a' }] }
// FB: drop the changed item(s) from the registry
// FB: (alternatively you could clear registry with an empty array, but this will be a little faster)
engine.updateItems(Object.keys(data).filter(o => o !== data.a.id ));
// FB: restore changed item(s)
engine.updateItems(Object.keys(data))
formatBibliography(engine.makeBibliography())
// "a. (n.d.). a."
// FB: returns "a. (n.d.). b."

// Resetting the engine by making a new registry does work...
// FB: registry is up to date
/*
 engine.registry = new CSL.Registry(engine)
 engine.updateItems(Object.keys(data))
 formatBibliography(engine.makeBibliography())
 // "a. (n.d.). b."
 */

// ...but when I remove an item and add a new item with the same author, it breaks:
delete data.a
data.c = { id: 'c', title: 'c', author: [{ family: 'a' }] }
// FB: updateItems will handle the registry adjustments
/*
 engine.registry = new CSL.Registry(engine)
 */
engine.updateItems(Object.keys(data))
formatBibliography(engine.makeBibliography())
// TypeError: Cannot read property 'id' of undefined
// FB: returns "a. (n.d.). c."

Thank you! Looks like it clears any possible problems with disambiguation as well.

Surprising that this hasn't come up before.

I was surprised as well, but not anymore. I tried to add a fixture to my test suite but apparently the error only occurs if the problematic data is in the second time the engine is used (so the first re-use). Whenever the engine has been used before for completely unrelated data, it does not break.