altoxml / schema

ALTO XML schema - latest and all former versions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

License of the schemas?

kba opened this issue · comments

Under what terms can the ALTO schema be used? Is there an authoritative source that I can link to?

Can I re-distribute them and possibly adapt them?

My use case is a format-agnostic OCR format transformation/validation tool.

The ALTO schema is open source and can be used free of charge.
Officially ALTO is hosted now by LOC (Library of Congress) and as source the according link can be used (same as the namespace): http://www.loc.gov/standards/alto
I hope the information are sufficient like this. Else do not hesitate to get back to me.
More details about the history of the generation can be found here:
http://www.loc.gov/standards/alto/about.php

Thanks for the information, that's what I wanted to hear :-) I just linked to your comment if anyone is interested in the license situation.

Could not hurt to write that somewhere on the official pages or use a standard LICENSE (file).

The only information on usage rights I found with a quick search was rather strict, e.g. for the documentation:

Copyright © 2014 ALTO Board. All rights reserved.
No part of this publication may be reproduced, stored in databases, or transferred in any form
(electronically, photo-mechanically, chemically, manually, or otherwise) without the express written
permission of the ALTO Board. The software described in this manual is licensed software that may be
used only in compliance with the licensing terms and conditions. The ALTO Board reserves the right to
make changes to the content of this manual without notice. The ALTO Board makes no guarantee
regarding the accuracy of the information provided in this manual. Microsoft, MS-DOS, and Windows are
registered trademarks of the Microsoft Corporation.
Parts of the software uses the Duden-Proof-Factory of the Brockhaus Duden Neue Medien GmbH for
syllable separation.
Product or company names that are mentioned may be trademarks or registered trademarks of the
respective company. The ALTO Board uses these names and trademarks in the following manual merely
for explanatory purposes and for the benefit of the respective user, and such use does not imply trademark
infringement.
Under this software license, you are only permitted to reproduce materials that are not protected by
copyright laws. This excludes only materials where you hold the copyright and/or legal permission to
reproduce copyrighted materials. If you are uncertain about the copyright status of certain materials then
please seek legal counsel. The ALTO Board holds no liability over copyright violations resulting from the
use of this software.

from https://github.com/altoxml/documentation/blob/master/v2/ALTO_changes_2_1.pdf

It would be good to introduce a machine-readable license for the ALTO schema - e.g. CC-BY-SA or CC-BY would seem a good fit. Similar discussions have been held with regard to METS.

METS uses CC0. http://www.loc.gov/standards/mets/version110/mets.xsd. I'm inclined to say we go with that, so I don't have to go to our legal department.

+1 for CC0, if that's okay with LoC, esp. given METS has this as well. Thanks Nate!

CC0 is one of the options we discussed during our last teleconference so should be fine.

If a standard license is agreed upon, it would be useful to also include this information in a comment in the ALTO schema, similarly to METS, e.g.

<!-- ALTO: Analyzed Layout and Text Object  -->
<!-- Originally created during the EU-funded Project METAe ...  -->
<!-- Prepared for the Library of Congress ...  -->
...
<!-- This document is available under the Creative Commons CC0 1.0 Universal 
Public Domain Dedication (http://creativecommons.org/publicdomain/zero/1.0/). 
For the full text see http://creativecommons.org/publicdomain/zero/1.0/legalcode.  -->

We may consider following current practices. For example Dublin Core Metadata initiative uses the following: http://creativecommons.org/licenses/by/3.0/

I agree with Raju. This limitation makes sense to ensure the standard remain open, linked to the source and outline the adaptions.

Just as update according to my mail to the board.
The Copyright for ALTO is in approval and to be save we at CCS are clarifying through Günther Mühlberger with the laywers of the Library of Innsbruck to get a written agreement to the change to CC BY-SA 4.0, as a former copyright statement was removed on handover to LoC. We just want to be sure to have an official written agreement of all the parties of METAe authors.

To achieve this in fastest and save way the best is to create a proposal for the header including the CC0 / CC BY-SA statement and to ask for the confirmation then. I just checked with METS as outlined by Clemens having this note. None of the other standards like MODS, MADS, etc have a statement right now.

Here my proposal with CC BY-SA 4.0 license - respective the text proposal from Clemens for CC0 at the same position (is according to METS sequence):

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
<!-- ALTO: Analyzed Layout and Text Object  -->
<!-- This document is available under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0 - https://creativecommons.org/licenses/by-sa/4.0/). The METAe working group has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. For the full text see https://creativecommons.org/licenses/by-sa/4.0/legalcode. -->

<!-- Originally created during the EU-funded Project METAe, the Metadata Engine Project (2001 - 2003), by Alexander Egger (1), Birgit Stehno (2) and Gregor Retti (2), (1) University of Graz and (2) University of Innsbruck, Austria with contributions of Ralph Tiede, CCS GmbH, Germany -->
<!-- Prepared for the Library of Congress by Ralph Tiede, CCS GmbH, with the assistance of Justin Littman (Library of Congress). -->
<!-- 

<!-- Version x.y -->

<!-- Change History -->

<xsd:schema ...>
</xsd:schema>
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Please confirm this or provide alternative proposal till the next call to be able to sign off this on the call to be able to send this draft to UIBK then.

Thank you @Jo-CCS.

Two quick remarks:

  1. In line 2 of the proposed new header, it states
    "The Digital Library Federation, as owner of this standard, has waived all rights..."
    This should be adapted to fit ALTO (who owns it?).

  2. According to our legal expert, we should be fine with CC BY-SA 4.0 license (fixed some issues compared to version 3.0, cf. https://creativecommons.org/share-your-work/licensing-considerations/version4/)

Accept. LC's legal dept will not have an issue.

I corrected once more the wording while working on the registration for the MIME type "application/alto+xml".

The wording for the owner has been adapted and confirmation of the original authors has been added to it. I will merge this now to the new version 3.2 where this finally will be confirmed again.
For those not given acceptance here yet, please confirm or comment,

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
<!-- ALTO: Analyzed Layout and Text Object  -->
<!-- This document is available under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0 - https://creativecommons.org/licenses/by-sa/4.0/ ). 
The ALTO Editorial Board has waived all rights to it worldwide under copyright law with confirmation of the original creating authors, including all related and neighboring rights, to the extent allowed by law.
For the full text see https://creativecommons.org/licenses/by-sa/4.0/legalcode. -->

<!-- Originally created during the EU-funded Project METAe, the Metadata Engine Project (2001 - 2003), by Alexander Egger (1), Birgit Stehno (2) and Gregor Retti (2), (1) University of Graz and (2) University of Innsbruck, Austria with contributions of Ralph Tiede, CCS GmbH, Germany -->
<!-- Prepared for the Library of Congress by Ralph Tiede, CCS GmbH, with the assistance of Justin Littman (Library of Congress). -->
<!-- 

<!-- Version x.y -->

<!-- Change History -->

<xsd:schema ...>
</xsd:schema>
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The CC ShareAlike 4.0 wording looks fine to me Jo.

As of v4.0, the license of the ALTO schema has been defined as CC-BY-SA 4.0.

Can the versions < 4.0 also be CC-BY-SA 4.0 licensed? If so, adding a LICENSE file here would make sense.

@stweil in UB-Mannheim/ocr-fileformat#82

@kba, does this also apply to the older ALTO versions? I found no CC-BY-SA there.

Addendum: altoxml/schema:v3/alto-3-2-draft.xsd@master also includes the license, but in older versions it seems to be missing. What about adding a LICENSE file in the root (so GitHub can also show the license)?

@kba No, sadly the decision for CC-BY-SA 4.0 will only take effect for v4.0 and any future versions of ALTO. The decision of the board was to refer to the license in the header of the schema, rather than in a separate LICENSE file (which could be added to GitHub in the repository for v4), as this is the way it is done on all LOC standards like METS etc.

The older schema files cannot be modified, but an according text was placed now on the GitHub pages at:
https://github.com/altoxml/documentation/blob/master/README.md
and
https://github.com/altoxml/schema/blob/master/README.md

This should clarify that this affects also the older version since starting the hosting by the Library of Congress.