`or`-ed license check breaks on licenses with "or" in the name
MartijnVdS opened this issue · comments
The or
-ed license check, added in #100 blindly splits the license name on the word "or".
This breaks the license check for projects like psycopg2
which has a license of GNU Library or Lesser General Public License (LGPL)
(this gets turned into ["GNU Library", "Lesser General Public License (LGPL)"]
by the get_license_names
function).
Oops. Reverting.
@MartijnVdS Thanks for identifying this issue.
Concerns
However, I'm concerned that a patch version was used to revert #100, which is itself a breaking change. I sympathize with the criticism that the current or
split is too broad, but an immediate revert disregards the or
-split's value in resolving the issue of combinatorial explosion of multiple licenses for most projects.
Furthermore, OR
is the keyword for separating license identifiers in the SPDX format which has become an international standard for tracking license requirements of software dependencies.
Proposed Solution
Surely a more productive fix would have been to add a test for GNU Library or Lesser General Public License (LGPL)
, then add it to a whitelist of licenses which would not be split by the character sequence or
. Such an approach would satisfy both needs without too much difficulty.
Based on the actions run prompting this issue, it appears that @MartijnVdS would also appreciate having GNU Lesser General Public License v2 or later (LGPLv2+)
in the proposed whitelist.
Reading the standard, it only specifies OR
in capital letters. Maybe not matching lowercase or
would do the trick?
Maybe not matching lowercase
or
would do the trick?
That could work. The existing code normalizes the casing to lowercase before checking licenses, but the OR
split* could happen before that normalization occurs.
- Note the spaces surrounding OR.
@MartijnVdS I've opened a new PR (#104) which handles the situation you raised. Do you have any concerns with the tests/implementation there?