Support guessing from README

Question

Support guessing from README

cybrox opened this issue 5 years ago · comments

As far as I can see, there is no implementation or work-in-progress for guessing information based on licenses pasted into README or README.md, is this correct?

Especially in Elixir, I've seen a lot of projects do this and I think it should be supported. Would be ready to write a PR for that functionality, if desired.

Unnawut Leepaisalsuwanna · Answer 1 · Tue Jun 25 2019 15:17:34 GMT+0800 (China Standard Time)

Hi @cybrox! No, we don't have that yet. This tool as it is now only checks from mix.exs and LICENSE files. Feel free to open a PR!

Sven Gehring · Answer 2 · Thu Jun 27 2019 14:49:43 GMT+0800 (China Standard Time)

I thought this would be quite tricky but it seems most maintainers that put the license information into the README just copy the whole license text in as well.

A pragmatic solution would be to solve this like #12 but I don't quite like it. Do you think there should be a kind of "scoring" system, where the LICENSE* files are more important than anything that might happen to be in a README or should they all count the same?

Unnawut Leepaisalsuwanna · Answer 3 · Thu Jun 27 2019 15:19:13 GMT+0800 (China Standard Time)

I havn't come across a library that dumps the license text into README though. Usually I see a one-liner:

[repo name] is released under [license name].

What this library does right now is it makes no assumptions regarding which source is more correct. But inform the user of the discrepancies, e.g.

Unsure (found: Apache 2.0, Apache 2)

This is because I think the risk of deducing the wrong license is too high and it's much better/safer to fix the discrepancy at the source.

Happy to hear more ideas, but my opinion would be to detect the one-liners and do a mapping something like this? https://github.com/unnawut/licensir/blob/master/lib/licensir/naming_variants.ex

Sven Gehring · Answer 4 · Thu Jun 27 2019 15:42:24 GMT+0800 (China Standard Time)

There are quite a few that do it, like ex_aws, decimal, mix-test.watch, ..., so I think matching for the whole license text should be supported as well.

I would definitely like to support one-liners and things like the example below. I think using a mapping like the naming variants would work very well.

### License
MIT

Unnawut Leepaisalsuwanna · Answer 5 · Thu Jun 27 2019 15:47:11 GMT+0800 (China Standard Time)

Oh wow. Thanks for the examples. I've used those before and didn't realise they put the full text there (granted MIT is short enough to do that). Looks good to me to try match them then.

If you also agree that we should inform discrepancies rather than self-deducing, feel free to push this ahead!