fsfe / reuse-docs

REUSE recommendations, tutorials, FAQ and specification

Home Page:https://reuse.software

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documenting Licenses used when they only apply to part of the file.

OneDeuxTriSeiGo opened this issue · comments

I have a project where sources are licensed under the GPL-3.0-or-later but documentation is generally licensed under CC-BY-SA-4.0. Some of that documentation is inline in the source files and a tool generates the resulting documentation.

I want to be able to express that any docs that are inline in the sources are licensed under CC-BY-SA-4.0 as it is part of the documentation but also licensed under GPL-3.0-or-later as it is part of the source code (and I'm fine with the docs being available under either license or any later revision of those licenses).

But importantly I want the source code (excluding the documentation comments) only licensed under GPL-3.0-or-later.

I can express this with a comment header/license notice in each file but it's not clear to me how to codify this in the SPDX-License-Identifier expressions.

I couldn't find any documentation regarding how to approach this with REUSE but I'd wager this is a fairly common situation and guidance on this could be useful to other people in the future.

That is why we introduced snippets :)

### In-line snippet comments

That is why we introduced snippets :)

Oh awesome. I was not even aware this was a thing however it is certainly a bit verbose and awkward from what I can tell. It'd be ideal if you could specify grammars to tag in .reuse and then use a block at the top of the file to apply whatever tags you include in that block to any sections that match the grammar for the block.

Also will I need to redeclare the copyright tag for each documentation block or will it inherit any tags I give to the file?

Oh awesome. I was not even aware this was a thing however it is certainly a bit verbose and awkward from what I can tell. It'd be ideal if you could specify grammars to tag in .reuse and then use a block at the top of the file to apply whatever tags you include in that block to any sections that match the grammar for the block.

We are working on renewing the .reuse part, but defining the start and end of snippets in an external file is quite finicky, so within REUSE we’d like to avoid it.

  • when you edit the file, the lines shift, so you’d need to re-check and re-define the snippet locations in .reuse
  • if someone copied the file without also the (hidden) .reuse the license info of the snippet would be lost

Snippet support is already in the SPDX specification (on which REUSE is based), so check out if this, if you really need it:

https://spdx.github.io/spdx-spec/v2.3/file-tags/#h3-snippet-tags-format
https://spdx.github.io/spdx-spec/v2.3/snippet-information/#9.4

That said, if you rely on the snippet definition in an SPDX file, you would need to re-generate the SPDX file on every version, to make sure the tags point to the right lines …and the best way to do that so far is with REUSE snippet tags in the source code 😅

Also will I need to redeclare the copyright tag for each documentation block or will it inherit any tags I give to the file?

Good question. I don’t know what the tool does, but given the logic behind REUSE that it’s easy to re-use code, the snippets should be self-contained, so if someone just copies the snippet with the tags, they have all the licensing info with it.

We are working on renewing the .reuse part, but defining the start and end of snippets in an external file is quite finicky, so within REUSE we’d like to avoid it.
* when you edit the file, the lines shift, so you’d need to re-check and re-define the snippet locations in .reuse

* if someone copied the file without also the (hidden) `.reuse` the license info of the snippet would be lost

Oh yeah I wasn't necessarily saying defining the start and end for each snippet but rather defining a grammar (i.e. the regex (?(?!\R\R)(\/\/\/.*\R)|(\/\/\/.*))+ matches any contiguous block of /// rust outer line doc comments) and then declaring a block of tags at the top of a file to apply the contents of the block to any text that matches the grammar in the rest of the file. i.e.

SPDX-SnippetsFromGrammarBegin: RUST_DOC_COMMENT
SPDX-tagname: <value>
...
SPDX-SnippetsFromGrammarEnd

where RUST_DOC_COMMENT is a regex defined somewhere in the repo (if not in .reuse).

I include a license header that states that documentation in the file is dual licensed but the code is not but it'd be awesome if I could get the SPDX file & associated automated tooling to properly reflect that without an egregious amount of repetition/line noise.

Snippet support is already in the SPDX specification (on which REUSE is based), so check out if this, if you really need it:

https://spdx.github.io/spdx-spec/v2.3/file-tags/#h3-snippet-tags-format https://spdx.github.io/spdx-spec/v2.3/snippet-information/#9.4

That said, if you rely on the snippet definition in an SPDX file, you would need to re-generate the SPDX file on every version, to make sure the tags point to the right lines …and the best way to do that so far is with REUSE snippet tags in the source code 😅

Yeah I figure if I can't find a more succinct way to handle it that's what I'll have to do.

Also will I need to redeclare the copyright tag for each documentation block or will it inherit any tags I give to the file?

Good question. I don’t know what the tool does, but given the logic behind REUSE that it’s easy to re-use code, the snippets should be self-contained, so if someone just copies the snippet with the tags, they have all the licensing info with it.

Noted.

Unless I misunderstood you severely, I would still caution against what you are trying to do, because whatever practical benefit it might bring, if the license info is not self-contained it will eventually get lost.

And if you know a repo / package is set up so that a part of its license info is unreliable, you cannot trust the whole thing.

The harder it is for a random person and machine to figure out what license governs a specific file or line of code, the bigger this problem.

Hmmm yeah it's less than ideal for sure.

What I might be able to do instead would be to leave the license header text at the top of each file like I have it now but mark the files as SPDX-License-Identifier: GPL-3.0-or-later OR LicenseRef-OnlyDocs-CC-BY-SA-4.0 but then tag the generated documentation files as SPDX-License-Identifier: GPL-3.0-or-later OR CC-BY-SA-4.0.

That way it's clear:

  • whatever is present in the documentation files can be used under either license unless indicated otherwise (such as code examples where we would use the snippets).
  • everything in the source files can be used under GPL-3.0-or-later but the documentation can be used under the custom OnlyDocs-CC-BY-SA-4.0 license (which would outline how the docs may be generated from those sources and that those generated files are licensed CC-BY-SA-4.0)
  • what licenses the contributor's contributions are made under.

and then the SBOMs should capture that when we run reuse spdx.

it's not the best solution but I don't think I'm making any glaring omissions that would break compliance?