Attribution: XML/RDF/Turtle please.

Question

Attribution: XML/RDF/Turtle please.

midijohnny opened this issue a month ago · comments

It would be handy if there was some additional formats for the "Credit the creator" section.
In particular - I would suggest at least it should include a simple well-formed XML record.

Better: one that corresponds to the Dublin Core specification : https://www.dublincore.org/specifications/dublin-core/dcmi-terms/

Or RDF in general - including a 'Turtle format'.

(Although publishing in Dublin Core XML format would be enough for others to automatically translate this to other forms of RDF probably).

I would suggest this would also encourage more compliance with attribution , since it would be easier for the author to automatically credit creators.

Olga Bulat · Answer 1 · Mon Jun 03 2024 23:25:59 GMT+0800 (China Standard Time)

I added "frontend" label because I think this refers to the frontend single result page's "Credit the creator" section, not the API's "attribution" property.

Dhruv Bhanushali · Answer 2 · Wed Jun 05 2024 19:45:51 GMT+0800 (China Standard Time)

Attribution formats like XML should be supported by the API as well imo, so having both is 👌 .

sarayourfriend · Answer 3 · Thu Jun 06 2024 10:38:21 GMT+0800 (China Standard Time)

DC sounds great! CC REL already uses DC terms, and the rich-text/HTML version of the attribution would be relatively easy to translate into a DC XML fragment.

Princewill Onyenanu · Answer 4 · Mon Jun 10 2024 23:28:13 GMT+0800 (China Standard Time)

I'd love to take on this.
Quick question off the top of my head, Should the generation of the XML attributions be done on the API level or on the frontend?

sarayourfriend · Answer 5 · Tue Jun 11 2024 08:02:48 GMT+0800 (China Standard Time)

Currently all frontend attribution generation happens in JavaScript: https://github.com/WordPress/openverse/blob/main/frontend/src/utils/attribution-html.ts. The python openverse-attribution package also exists, but we can back-port this feature to there later on, if it's needed. For now, just add it to the frontend.

The frontend's attribution-html module generates the HTML for each type of attribution. Rich text is the same as the HTML, but we render the HTML directly, rather than displaying the HTML as code to copy. Plain text is the same, but without any markup.

The XML snippet should just be another option of output. You can use the existing methods for generating HTML to generate the XML.

Are you familiar with DC or RDF @madewithkode? There are a lot of resources online about both, but DublinCore's own documentation tends to be the best, and here's their documentation about RDF/XML specifically: https://www.dublincore.org/specifications/dublin-core/usageguide/#rdfxml and https://www.dublincore.org/specifications/dublin-core/dc-xml-guidelines/

The snippet there already gives a good idea of how to add the parts we'd need, it's essentially 1:1 with that, except we'd also populate dc:rights. Something like this, using https://openverse.org/image/feb91b13-422d-46fa-8ef4-cbf1e6ddee9b?q=galah as an example:

<rdf:RDF 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:>

   <rdf:Description rdf:about="https://www.flickr.com/photos/126953422@N04/40593461235">

      <dc:creator>Graham Winterflood</dc:creator>
      <dc:title>Galah in Darwin (Eolophus roseicapilla)</dc:title>
      <dc:rights>"Galah in Darwin (Eolophus roseicapilla)" by Graham Winterflood is licensed under CC BY-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse.</dc:rights>

   </rdf:Description> 
</rdf:RDF>

That interprets dc:rights as the broadest possible rights statement, and makes things relatively "uncomplicated" for us, when it comes to deciding how to represent CC with just DC. If we want to bring in CC REL, that's a separate story. I believe we could offer that, but if we want just the most basic RDF representation with just DC, this is probably it. Users can edit down dc:rights to whatever makes sense for their use case. This also has us ignoring a bunch of DC's recommendations for how to format DC XML, including not using DC (with XSI) to designate the type of resource, the type of resource identifier, and more detailed information about the rights statement.

However, I think we shouldn't create the full RDF XML, and instead, just offer the DC elements as XML (and we could follow this up by offerring different formats like Turtle or JSON-LD in the future, as separate issues). So then, we'd just have a copyable snippet, with some explanatory text. Maybe like this:

<dc:creator>Graham Winterflood</dc:creator>
<dc:title>Galah in Darwin (Eolophus roseicapilla)</dc:title>
<dc:rights>"Galah in Darwin (Eolophus roseicapilla)" by Graham Winterflood is licensed under CC BY-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse.</dc:rights>
<dc:identifier>https://www.flickr.com/photos/126953422@N04/40593461235</dc:identifier>
<dc:type>StillImage</dc:type>

dc:type should be Sound for audio.

It can be that simple, if we like. @midijohnny please let me know if I've got this wrong... I'm basing this on just 6 months of Library and Information Services courses I took recently, and only did a small amount of DC, but never anything in XML.

I don't think we should try to use DCMI terms (like implementing RightsStatements) because ultimately DC is so flexible, every institution or system realistically has its own approach to how they want to use it. Listing the DC terms like this as an XML snippet is my guess at the most flexible version of what we could do here.

sarayourfriend · Answer 6 · Tue Jun 11 2024 08:03:45 GMT+0800 (China Standard Time)

Assigned to you, @madewithkode, but it's probably a good idea to wait for @midijohnny to give more input before going to strongly in one direction (snippet, full RDF, which terms to use, etc). I do think it's best to stick with just an XML snippet for this first issue.

Princewill Onyenanu · Answer 7 · Tue Jun 11 2024 15:42:52 GMT+0800 (China Standard Time)

I agree with you @sarayourfriend, any more extra/specific details regarding what's required would be appreciated. And thank you for the really indepth insights on this topic, I'd be sure to checkout the resources you shared as I do not have any prior experience with all the other markups/specifications being discussed asides XML. I'd standby on this a bit to see if @midijohnny has anything more to add before getting started.

midijohnny · Answer 8 · Wed Jun 12 2024 19:38:15 GMT+0800 (China Standard Time)

Great discussion ! I'm not an expert in RDF or Dublin Core either - but I would say the example above ("Maybe like this...") is going to be good enough - with one minor alteration - to include a root element with a namespace identifier.
That way : we would have a well-formed XML document in a specific namespace.

So something like:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
	<dc:creator>Graham Winterflood</dc:creator>
	<dc:title>Galah in Darwin (Eolophus roseicapilla)</dc:title>
	<dc:rights>"Galah in Darwin (Eolophus roseicapilla)" by Graham Winterflood is licensed under CC BY-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse.</dc:rights>
	<dc:identifier>https://www.flickr.com/photos/126953422@N04/40593461235</dc:identifier>
	<dc:type>StillImage</dc:type>
</metadata>

It doesn't have to be 'metadata' - it could be (say) 'attribution' or whatever you think it best.

Having a well-formed document like this - with the namespace included (so people can look up the vocabulary based on the namespace) would provide a large benefit I think.

It means (for instance) somebody downstream can build an XSLT to transform this to what suits them.
You could even consider using this (or something similar) as the 'base' information and use XSLT to transform to the HTML/plain-text format to be displayed on the website - but that is just a suggestion.

For my purposes: I was collecting images to display in an XHTML (i.e. well-formed XML) environment, so if I had the format above it would have made my life easier.

midijohnny · Answer 9 · Wed Jun 12 2024 19:42:57 GMT+0800 (China Standard Time)

For additional context - here's why I logged the original request.
I was building a small example that needed some example images and I wanted to make sure I displayed the attribution (of course) - I had to build my own representation in a file images.xml, but if the original attribution information was already available in a relatively simple well-formed document, I would have just been able to use that (perhaps with minor edit) straight-off.

sarayourfriend · Answer 10 · Thu Jun 13 2024 05:14:19 GMT+0800 (China Standard Time)

Perfect, thanks very much @midijohnny! I was wondering how best to include the namespace, that looks great. And makes things more flexible for the future if we want to implement CC REL.

@madewithkode how do you feel about starting on this, when you have time? Do you feel you have enough to go on to get started?

Princewill Onyenanu · Answer 11 · Thu Jun 13 2024 22:13:04 GMT+0800 (China Standard Time)

Sorry I'm late guys, been battling a flu. Really great insights and extra contexts @midijohnny
@sarayourfriend sure, I should be able to start off something with the information at hand, once I'm fully back.

sarayourfriend · Answer 12 · Fri Jun 14 2024 05:43:46 GMT+0800 (China Standard Time)

No worries at all, take your time and get well soon! There's no rush or pressure with this.

zack · Answer 13 · Wed Jun 19 2024 05:06:17 GMT+0800 (China Standard Time)

I wanted to share some prior art here concerning XML. The Dublin Core we're adding in #4499 looks good. I also remembered today that Creative Commons' own License Chooser offers Extensible Metadata Platform (XMP) format, which is XML in a .xmp file.

Fun fact: @obulat implemented it a few years ago in this PR: creativecommons/chooser#272. A small change was made to that implementation shortly after.

I wonder if we should support that format as well?

sarayourfriend · Answer 14 · Wed Jun 19 2024 06:57:15 GMT+0800 (China Standard Time)

What's the use case for downloading an XMP snippet? Wouldn't you use your image editor software to add that data, either embedded as an EXIF extension or as a sidecar file? I didn't know it could (or would) be used for attribution, I'm only familiar with it for use to describe the immediate work. I guess if you can add arbitrary additional metadata, attributions would just go there? I'm not sure how you would structure attributions in DC. An array of RightsStatements?

zack · Answer 15 · Wed Jun 19 2024 22:40:32 GMT+0800 (China Standard Time)

The only (possible) use case I can think of is in the context of remixing works, where you might want to store or modify the original XMP for your new, derived work.

Even that feels somewhat contrived though. Probably best to wait on that until someone with a clear use case asks for it, as happened with dublin core here! 😄