w3c / rdf-canon

RDF Dataset Canonicalization (deliverable of the RCH working group)

Home Page:https://w3c.github.io/rdf-canon/spec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Check against i18n Review Checklist

philarcher opened this issue · comments

This short review is for the following spec: RDF Dataset Canonicalization.

  1. If the spec (or its implementation) contains any natural language text that will be read by a human (this includes error messages or other UI text, JSON strings, etc, etc), ensure that there’s metadata about and support for basic things such as language and text direction. Also check the detailed guidance for Language and Text direction.

    Comments_go_here.

    • RDF Datasets may contain natural language text. The methods for encoding such text are described in existing RDF specs. See the Turtle specification on RDF Literals for example. The RDF Canonicalization spec accepts any serialization of RDF but does not create a new one. The majority of the c14n spec is concerned with the algorithm for labeling blank nodes and ordering the resulting quads to create a canonical form. At the time of writing, the WG has not settled on whether such a canonical form has a canonical serialization (although it seems likely that it will). In other words, the current working assumption is that i18n issues are covered by the existing RDF standards and that no new issues are created by canonicalization. It is on this point that we would be very grateful for feedback.
  2. If the spec (or its implementation) allows content authors to produce typographically appealing text, either in its own right, or in association with graphics. take into account the different typographic styles used around the world (for things such as line-breaking, text justification, emphasis or other text decorations, text selection and units, etc.) Also check the detailed guidance for Typographic support.

    Comments_go_here.

    • Not applicable
  3. If the spec (or its implementation) allows the user to point into text, creates text fragments, concatenates text, allows the user to select or step through text (using a cursor or other methods), etc. make allowances for the ways different scripts handle units of text. Also check the detailed guidance for Text-processing.

    Comments_go_here.

    • Not applicable
  4. If the spec (or its implementation) allows searching or matching of text, including syntax and identifiers understand the implications of normalisation, case folding, etc. Also check the detailed guidance for Text-processing.

    Comments_go_here

    • This is where there may be a need for new i18n considerations over and above those already defined in the RDF specifications but we think it unlikely.
  5. If the spec (or its implementation) sorts text ensure that it does so in locally relevant ways. Also check the detailed guidance for Text-processing.

    Comments go here.

    • We believe the existing RDF specs offer sufficient clarity in this regard but, again, we'd be grateful for any feedback on this. The RDF Dataset c14n algorithm does include a step where all quads are arranged in lexical order.
  6. If the spec (or its implementation) captures user input ensure that it also captures metadata about language and text direction, and that it accommodates locale-specific input methods.

    Comments go here.

    • Not applicable
  7. If the spec (or its implementation) deals with time in any way that will be read by humans and/or crosses time zone boundaries ensure that it will represent time as expected in locales around the world, and manage the relationship between local and global/absolute time. Also check the detailed guidance for Local dates, times and formats.

    Comments go here.

    • Not applicable
  8. If the spec (or its implementation) allows any character encoding other than UTF-8. make sure you have a convincing argument as to why, and then ensure that the character encoding model is correct. Also check the detailed guidance for Characters.

    Comments go here.

    • Not applicable
  9. If the spec (or its implementation) defines markup ensure support for internationalisation features and avoid putting human-readable text in attribute values or plain-text elements. Also check the detailed guidance for Markup & syntax.

    Comments go here.

    • Not applicable
  10. If the spec (or its implementation) deals with names, addresses, time & date formats, etc ensure that the model is flexible enough to cope with wide variations in format, levels of data, etc. Also check the detailed guidance for Local dates, times and formats.

    Comments go here.

    • Not applicable
  11. If the spec (or its implementation) describes a format or data that is likely to need localization. ensure that there’s an approach in place which allows effective storage and labelling of, and access to localised alternatives for strings, text, images, etc.

    Comments go here.

    • We believe this is covered by the base RDF standards and won't be affected by c14n.
  12. If the spec (or its implementation) makes any reference to or relies on any cultural norms ensure that it can be adapted to suit different cultural norms around the world (ranging from depictions of people or gestures, to expectations about gender roles, to approaches to work and life, etc).

    Comments go here.

    • Not applicable

Short i18n review checklist is here

If the spec (or its implementation) contains any natural language text that will be read by a human (this includes error messages or other UI text, JSON strings, etc, etc), ensure that there’s metadata about and support for basic things such as language and text direction. Also check the detailed guidance for Language and Text direction.

  • RDF Datasets may contain natural language text. The methods for encoding such text are described in existing RDF specs. See the [Turtle specification on RDF Literals](https://www.w3.org/TR/2014/REC-turtle-20140225/#literals} for example. The RDF Canonicalization spec accepts any serialization of RDF but does not create a new one. The majority of the c14n spec is concerned with the algorithm for labeling blank nodes and ordering the resulting quads to create a canonical form. At the time of writing, the WG has not settled on whether such a canonical form has a canonical serialization (although it seems likely that it will). In other words, the current working assumption is that i18n issues are covered by the existing RDF standards and that no new issues are created by canonicalization. It is on this point that we would be very grateful for feedback.

Note that JSON-LD instituted a pattern for defining text-direction along with language using a datatype (see The i18n Namespace). In principle, this can be done in any serialization, and IMHO, should be a work item for the RDF-star WG. Although, it may have some semantics implications. But, we may want to mention that as part of the Internationalization checklist.

  • If the spec (or its implementation) sorts text ensure that it does so in locally relevant ways. Also check the detailed guidance for Text-processing.
    Comments go here.

    • We believe the existing RDF specs offer sufficient clarity in this regard but, again, we'd be grateful for any feedback on this. The RDF Dataset c14n algorithm does include a step where all quads are arranged in lexical order.

We do sort text as part of the algorithm function, and added specific text on using Unicode code point order for doing that. See Unicode code point order in Canonicalization Algorithm Terms.

Thanks @philarcher and @gkellogg. I have put this self-review on I18N's agenda for this week and have reviewed the above briefly.

We do sort text as part of the algorithm function

I recall our conversing about this in the previous issues and you should be fine with code point order.

Reviewing the above--thank you for the self-review--a few comments.

We believe the existing RDF specs offer sufficient clarity in this regard but, again, we'd be grateful for any feedback on this. The RDF Dataset c14n algorithm does include a step where all quads are arranged in lexical order.

In looking at the current WD, I see that (as noted by @gkellogg) you define and use Unicode code point order throughout. This is the right thing to do and you look to be in good shape here.

If the spec (or its implementation) allows searching or matching of text, including syntax and identifiers understand the implications of normalisation, case folding, etc. Also check the detailed guidance for Text-processing.

Regarding matching/searching/find processing, I don't see anywhere that you are defining text searching/find operations or textual regular expressions, so I think that section doesn't apply to your spec.

We believe this is covered by the base RDF standards and won't be affected by c14n. (comment in regard to localizability)

I agree. This is N/A for your spec.

In general, a quick perusal of the WD didn't turn up any additional issues. As always, if you have questions or need assistance, please call out to I18N!