JATS: Plain text citation
rgieseke opened this issue · comments
Mixed citations (e.g. originally from bibitems in LaTeX) are not parsed when reading from JATS-XML.
I think the mixed-content
should actually be mixed-citation
: https://github.com/stencila/encoda/blob/master/src/codecs/jats/index.ts#L1165
Even nicer would probably be to have all elements of the mixed-citation
as inline elements to keep e.g. parts in italics.
<mixed-citation>Norman de Plume <italic>The book of XML problems</italic>. XtraPress 2021.</mixed-citation>
Maybe this could be filed as description
or comment
?
I think the ideal way to handle a <mixed-citation>
would be to try to "decode" it (ie. parse it) into a CreativeWork
. If we do that then in text Cite
nodes will work as expected (ie. show authors and years if needed).
With the fix that you made the entirety of the <mixed-citation>
type: Article
id: pone-0091296-Choat1
authors: []
title: >-
Choat JH (2012) Spawning aggregations in reef fishes; ecological and
evolutionary processes. In: Sadovy de Mitcheson Y, Colin PL, editors. Reef
Fish Spawning Aggregations: Biology, Research and Management. Heidelberg:
Springer. pp. 85–116.
whereas what we want is the bibliographic info to be parsed out of the <mixed-citation>
into
type: CreativeWork
authors:
- type: Person
familyNames:
- Choat
givenNames:
- John Howard
datePublished:
type: Date
value: '2011-09-20'
identifiers:
- type: PropertyValue
name: doi
propertyID: https://registry.identifiers.org/registry/doi
value: 10.1007/978-94-007-1980-4_4
isPartOf:
type: Periodical
name: 'Reef Fish Spawning Aggregations: Biology, Research and Management'
publisher:
type: Organization
name: Springer Netherlands
title: Spawning Aggregations in Reef Fishes; Ecological and Evolutionary Processes
url: http://dx.doi.org/10.1007/978-94-007-1980-4_4
In Encoda, rather than trying to parse references into a CreativeWork
, we take the approach suggested here and query CrossRef for bibliographic info. I didn't write the above YAML out by hand but rather used the crossref
codec:
./encoda convert "Choat JH (2012) Spawning aggregations in reef fishes; evolutionary processes." --from crossref - --to yaml
I suggest that we use this approach for JATS <mixed-citation>
(as we do in the reshape
function). However, I think it would be wise to perhaps put it in name
or alternateNames
or similar (I think description
should be avoided because that is where the abstract goes and in some cases we actually have that; and comment
has a different semantic structure) and then do the CrossRef querying as a separate enrichment step that won't cause a failure, if for instance there is no network connection.
🎉 This issue has been resolved in version 0.111.0 🎉
The release is available on:
Your semantic-release bot 📦🚀
I suggest that we use this approach for JATS (as we do in the reshape function). However, I think it would be wise to perhaps put it in name or alternateNames or similar (I think description should be avoided because that is where the abstract goes and in some cases we actually have that; and comment has a different semantic structure) and then do the CrossRef querying as a separate enrichment step that won't cause a failure, if for instance there is no network connection.
Yes, i was mistakenly thinking that description
was belonging to the citation and not the entire creativeWork.
The CrossRef querying approach sounds great, how could that work?
Should it be an extra conversion? JATS to CrossRef enhanced JATS? Or should it be tried in the JATS codec?
Should it be an extra conversion? JATS to CrossRef enhanced JATS?
Yes, that is what I advocating for above. It shouldn't be part of the decode
method of the JatsCodec
but rather part of a generic function which can be applied to references
of any Article
no matter which format it originated from. That is exactly what currently happens here in the reshape
function but it is currently "converting" paragraphs into CreativeWork
s using CrossRef:
Lines 341 to 370 in 52b872a
I think this code should be factored our into a separate enrich
function and applied to Paragraphs
in the references section, but also to string
items in the references
property of any CreativeWorks
(I had forgotten that string
is a valid item in references
).
In summary, what needs to happen if we take this direction is:
- In the
jats
codec, return astring
for<mixed-citation>
instead of setting the title:encoda/src/codecs/jats/index.ts
Lines 1165 to 1167 in 8dc58e3
- In the
reshape
function, turnParagraph
s that are in the "References" or "Bibliography" section into plainstring
items inArticle.references
- Move relevant code from
reshape
function and put into a newenrich
function where it is applied to any item inArticle.references
that is astring
- Enable
enrich
afterdecode
by default but allow user to disable it (like we do forcoerce
andreshape
)Lines 271 to 273 in 39813e0