Specify the format of `.license` files

Question

Specify the format of `.license` files

kirelagin opened this issue 4 years ago · comments

Kirill Elagin commented 4 years ago

Currently the spec only says:

an adjacent file of the same name with the additional extension .license

In other words, it only specifies the name of the file, but does not clearly specify the contents.

Therefore, I propose to:

State that this has to be a text file. (For context, POSIX defines what a text file is; related: fsfe/reuse-tool#187).
Require it to be encoded as UTF-8 (This one is tricky, but, trust me, this needs to be done (for context, there is a fantastic blog post that explains the issue although in the context of Haskell, but the general principles are the same); related: fsfe/reuse-tool#221).

Kirill Elagin · Answer 1 · Fri Dec 18 2020 09:44:07 GMT+0800 (China Standard Time)

The simplest possible change would be to s/an adjacent file/an adjacent UTF-8-encoded text file/, however it might be worth it to make this even more explicit and split into a separate sentence or something.

Matija Šuklje · Answer 2 · Fri Dec 18 2020 16:03:08 GMT+0800 (China Standard Time)

A problem I see with making this a requirement is places where UTF is not an option.

For example, already if you look at the Linux kernel source code, it is encoded in ASCII. I suspect other software that is aimed to be embedded in hardware will have a similar limitation.

I do think we should at least encourage UTF. How about adding That file SHOULD be UTF-8-encoded. instead? So a (hard) suggestion, but not a requirement.

Simon McVittie · Answer 3 · Sat Jul 03 2021 02:03:45 GMT+0800 (China Standard Time)

For example, already if you look at the Linux kernel source code, it is encoded in ASCII

UTF-8 is an ASCII-compatible encoding (a superset of ASCII where every byte value that is allowed in ASCII means the same thing in UTF-8), so every ASCII text file is automatically a valid UTF-8 text file.

Matija Šuklje · Answer 4 · Wed Jul 21 2021 22:21:28 GMT+0800 (China Standard Time)

UTF-8 is an ASCII-compatible encoding (a superset of ASCII where every byte value that is allowed in ASCII means the same thing in UTF-8), so every ASCII text file is automatically a valid UTF-8 text file.

True, and that is a great feature of Unicode encodings :)

Simon McVittie · Answer 5 · Wed Jul 21 2021 23:10:34 GMT+0800 (China Standard Time)

True, and that is a great feature of Unicode encodings :)

Not Unicode encodings in general, just UTF-8. UTF-16, UCS-2 and UCS-4 are not ASCII supersets.