fsfe / reuse-docs

REUSE recommendations, tutorials, FAQ and specification

Home Page:https://reuse.software

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Specify the format of `.license` files

kirelagin opened this issue · comments

Currently the spec only says:

an adjacent file of the same name with the additional extension .license

In other words, it only specifies the name of the file, but does not clearly specify the contents.

Therefore, I propose to:

  1. State that this has to be a text file. (For context, POSIX defines what a text file is; related: fsfe/reuse-tool#187).
  2. Require it to be encoded as UTF-8 (This one is tricky, but, trust me, this needs to be done (for context, there is a fantastic blog post that explains the issue although in the context of Haskell, but the general principles are the same); related: fsfe/reuse-tool#221).

The simplest possible change would be to s/an adjacent file/an adjacent UTF-8-encoded text file/, however it might be worth it to make this even more explicit and split into a separate sentence or something.

A problem I see with making this a requirement is places where UTF is not an option.

For example, already if you look at the Linux kernel source code, it is encoded in ASCII. I suspect other software that is aimed to be embedded in hardware will have a similar limitation.

I do think we should at least encourage UTF. How about adding That file SHOULD be UTF-8-encoded. instead? So a (hard) suggestion, but not a requirement.

For example, already if you look at the Linux kernel source code, it is encoded in ASCII

UTF-8 is an ASCII-compatible encoding (a superset of ASCII where every byte value that is allowed in ASCII means the same thing in UTF-8), so every ASCII text file is automatically a valid UTF-8 text file.

UTF-8 is an ASCII-compatible encoding (a superset of ASCII where every byte value that is allowed in ASCII means the same thing in UTF-8), so every ASCII text file is automatically a valid UTF-8 text file.

True, and that is a great feature of Unicode encodings :)

True, and that is a great feature of Unicode encodings :)

Not Unicode encodings in general, just UTF-8. UTF-16, UCS-2 and UCS-4 are not ASCII supersets.