unnawut / licensir

An Elixir mix task that lists all the licenses used by your Mix project dependencies.

Home Page:https://hex.pm/packages/licensir

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support guessing from README

cybrox opened this issue · comments

As far as I can see, there is no implementation or work-in-progress for guessing information based on licenses pasted into README or README.md, is this correct?

Especially in Elixir, I've seen a lot of projects do this and I think it should be supported. Would be ready to write a PR for that functionality, if desired.

Hi @cybrox! No, we don't have that yet. This tool as it is now only checks from mix.exs and LICENSE files. Feel free to open a PR!

I thought this would be quite tricky but it seems most maintainers that put the license information into the README just copy the whole license text in as well.

A pragmatic solution would be to solve this like #12 but I don't quite like it. Do you think there should be a kind of "scoring" system, where the LICENSE* files are more important than anything that might happen to be in a README or should they all count the same?

I havn't come across a library that dumps the license text into README though. Usually I see a one-liner:

[repo name] is released under [license name].

What this library does right now is it makes no assumptions regarding which source is more correct. But inform the user of the discrepancies, e.g.

Unsure (found: Apache 2.0, Apache 2)

This is because I think the risk of deducing the wrong license is too high and it's much better/safer to fix the discrepancy at the source.

Happy to hear more ideas, but my opinion would be to detect the one-liners and do a mapping something like this? https://github.com/unnawut/licensir/blob/master/lib/licensir/naming_variants.ex

There are quite a few that do it, like ex_aws, decimal, mix-test.watch, ..., so I think matching for the whole license text should be supported as well.

I would definitely like to support one-liners and things like the example below. I think using a mapping like the naming variants would work very well.

### License
MIT

Oh wow. Thanks for the examples. I've used those before and didn't realise they put the full text there (granted MIT is short enough to do that). Looks good to me to try match them then.

If you also agree that we should inform discrepancies rather than self-deducing, feel free to push this ahead!