Juris-M / citeproc-js

A JavaScript implementation of the Citation Style Language (CSL) https://citeproc-js.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use Citeproc-JS to extract / parse name parts?

coryschires opened this issue · comments

I have an odd question... Citeproc-JS does a great job parsing names, specifically extracting less common name parts (e.g. suffix, particle, etc) from overloaded family and given fields. In case it's not obvious, I'm talking about the behavior described here and here.

I need this exact bit of functionality elsewhere in my code. Rather than re-implement it myself, I would like – if possible – to use Citeproc-JS for this task.

Unfortunately, I'm a little confused splunking in the source code. I think CSL.parseParticles is the function I need (or something in that area). But I can't quite seem to find the right method.

For example, I have tried:

let name = {"given": "John", "family": "Doe Jr."}
CSL.parseParticles(name, true)  // => undefined

Is this possible? Clearly, Citeproc-JS is doing this work internally. But is there a publicly accessible function that I can use?

Any help would be much appreciated!

PS: Thanks for writing this library! It's really the best of its kind.

Little nudge here. Y'all have any leads / ideas? Otherwise I will need to reimplement this logic myself (which is probably gonna be terrible)

@fbennett That would be great if you could pull that out! Denis, Bruce, and I were talking that we'd like to formalize those two-field processing steps so that other citeproc developers can take advantage of your work there.

Awesome! I'm glad to hear you are open to this idea.

do you need to receive two-field input and split it into dropping, non-dropping particles, family name, given name, and suffix?

Yes. That sums it up.


A little more info about my use case as I suspect others may encounter similar – if not the exact same – problem.

Internally, I am storing the name data in two fields. This has proven handy because it's keeps my UI simple (i.e. no need for additional fields for suffix, non-dropping particles, etc). And, in practice, there's really no drawback because citeproc-js accurately teases all the data apart automatically. So far, so good.

Additionally, I would like to send this author name data to Crossref (but it really could be any service). Understandably, Crossref expects name metadata – suffixes, particles, etc – to be tagged separately from given and family. So, like citeproc-js, I need to tease this less common metadata out of the two main fields. It would be sweet if I could use citeproc-js to do this work.

@fbennett Just a friendly nudge on this issue.

I'm sure you're busy, and I appreciate all the work you've put into this project!

@fbennett Another friendly nudge!

Thanks again for you work!

For example, I have tried:

let name = {"given": "John", "family": "Doe Jr."}
CSL.parseParticles(name, true)  // => undefined

parseParticles modifies its input. Just inspect name after the call, the data is all there.

Rather than pulling the algorithm out, pass it through a bundler. Tree-shaking will remove what you don't need.

@retorquere Thanks for the tip. I'll give it a try later today and follow up with a comment.

parseParticles modifies its input. Just inspect name after the call, the data is all there.

If it's this simple, I'll be kicking myself!

@retorquere Unfortunately, this tip does not seem to work:

let name = {"given": "John", "family": "Doe Jr."}
CSL.parseParticles(name, true)
console.log(name)
//  => { given: 'John', family: 'Doe Jr.' }

Perhaps I am misunderstanding your advice?

@fbennett

If it helps, I'd be happy to make a pull request to support these changes.

All I would need from you (if it's not too much trouble) would be a couple sentences pointing me in the right direction. I can probably figure it out from there, write up a PR, and we can hash out the details.

Thanks again!

const CSL = require('citeproc')
let name = {"given": "John, Jr.", "family": "van der Doe"}
CSL.parseParticles(name)
console.log(name)

suffix apparently lives on given, not family. The true parameter doesn't appear to be doing anything, I can't find it in the source.

@fbennett

If it helps, I'd be happy to make a pull request to support these changes.

All I would need from you (if it's not too much trouble) would be a couple sentences pointing me in the right direction. I can probably figure it out from there, write up a PR, and we can hash out the details.

The particle parser lives here. The challenge would be that as far as I can tell, the sources are just concatted together; modularising would mean some kind of bundler would be involved to make a single-source version from the modularised version. Not incredibly complex, but work.

@retorquere This seems to be working. Thanks so much!

let name = {"given": "John, Jr.", "family": "van der Doe"}
CSL.parseParticles(name)
console.log(name);

//  => {
  given: 'John',
  family: 'Doe',
  'non-dropping-particle': 'van der',
  suffix: 'Jr.'
}

Damn. I can't believe I didn't notice that suffixes should go in given rather than family. I must have been reading the docs too quickly.


As for extracting this method. To be clear, I don't care about it being literally stand-alone (as in how it bundles, exports, etc). I just wanted to have a public method which I could use to do this work (i.e. I wanted exactly CSL.parseParticles). So this is perfect. I just was being a little slow understanding how to use it.

Thanks again!!!

PS: Agreed. The second true argument doesn't seem to make any difference. So guess it's not needed.

@retorquere @fbennett Not to reopen this issue, but I wanted to point out a discrepancy between the behavior of CSL.parseParticles and these docs.

Taking the example directly from the docs:

let name = { given: 'Rev. Martin Luther Jr., Ph.D.', family: 'King' }
CSL.parseParticles(name)
console.log(name);

This produces the following parsed name:

{
    'given': 'Rev. Martin Luther Jr.', 
    'family': 'King', 
    'suffix': 'Ph.D.' 
}

Whereas the docs say that it should produce:

{
    "family": "King",
    "given": "Martin Luther",
    "suffix":"Jr., Ph.D.",
    "dropping-particle":"Rev."
}

Fortunately, this discrepancy is not a problem for me. So, afaic, we're good to go. Nevertheless, I wanted to mention it in case it's evidence of a bug or perhaps the docs need to be updated.

Thanks again!!!