License Inconsistencies - Please Clarify

Question

License Inconsistencies - Please Clarify

FoobarProtocol opened this issue a year ago · comments

Hello! Great job on publishing the model and all the updates you all have made since then.

I had just one question though that I think is really important to be answered since it has profound implications for many involved that are planning on utilizing this code for any planning to use the models in this repo (either as-is or augmenting/modifying/fine-tuning for some other purpose).

What is the True License for WizardCoder?

I ask because it was initially released with a 100% permissive license on HuggingFace, but in the README.md on there, you all noted that you would be attaching an 'NC' (non-commercial) stamp on it. Then this repo was uploaded but in its initial iteration, there were no license files attached to the model at all.

Then, later on, in commit e8cdb43 several license files were added to both WizardCoder and WIzardLM that made both models with an CC-BY-NC 4.0 license for all parts of it except for the data that was used (since all of it had a permissive license to begin with, I'm assuming?)

Now, I see the README.md has been updated to show the following:

^Model	^Checkpoint	^Paper	^MT-Bench	^AlpacaEval	^WizardEval	^HumanEval	^License
^{WizardLM-13B-V1.2}	^{🤗 HF Link}		^7.06	^89.17%	^101.4%	^{36.6 pass@1}	^{Llama 2 License}
^{WizardLM-13B-V1.1}	^{🤗 HF Link}		^6.76	^86.32%	^99.3%	^{25.0 pass@1}	^{Non-commercial}
^{WizardLM-30B-V1.0}	^{🤗 HF Link}		^7.01		^97.8%	^{37.8 pass@1}	^{Non-commercial}
^{WizardLM-13B-V1.0}	^{🤗 HF Link}		^6.35	^75.31%	^89.1%	^{24.0 pass@1}	^{Non-commercial}
^{WizardLM-7B-V1.0}	^{🤗 HF Link}	^{📃 [WizardLM]}			^78.0%	^{19.1 pass@1}	^{Non-commercial}
^{WizardCoder-15B-V1.0}	^{🤗 HF Link}	^{📃 [WizardCoder]}				^{57.3 pass@1}	^OpenRAIL-M

Above, we can see:

The badges state that the code license is Apache 2.0 but the Data License is CC-BY-NC 4.0.
However, in the table that you all provided, it states that the WizardCoder model is now licensed under 'OpenRail-M'.

OpenRail-M is the same license that Starcoder and its downstream iterations use. You all hyperlink the OpenRail-M license in your table, which brings us here.

That link holds the full text of the OpenRail-M license. On that link it states, "This License Agreement strives for both the open and responsible Use of the accompanying Model. Openness here is understood as enabling users of the Model on a royalty free basis to Use it, modify it, and even share commercial versions of it. Use restrictions are included to prevent misuse of the Model."

This license is different than the Apache 2.0 and CC-BY-NC 4.0 licenses that you all slapped on the constituent parts of the model. OpenRail-M also contradicts at least the CC-BY-NC 4.0 license since it permits commercial use of the model, but the model itself was created with the data that you all have licensed under CC-BY-NC 4.0 (I'm assuming, because even this is not particularly well-defined as far as its meaning since some of the data that was used in the creation of the WizardCoder model was permissively licensed, making the CC-BY-NC 4.0 license incompatible with the license requirements of the data that was aggregated and used in the training of Starcoder/Starchat).

Please Provide Clarifications and a Final Verdict

I don't mean to come across like an ass but it would be helpful (for the community) if there were a clear consensus by the ones responsible for managing this project how they intend for it to be licensed and used. It may even help for you all to take the time to explain what it is that you're hoping to achieve with your licensing so that your wishes can be adhered to or you can at least receive community feedback on what the best option may be.

It seems that the team behind this project wishes to maintain a fidelity to the concept of Open Source development and open development and implementations, in general. If that's the case, then I implore you all to consider the Open Source Foundation's scathing reprimand of Facebook's Llama 2 license requirements here.

Specifically, it is stated that, "Unfortuntaely, the tech giant has created the misunderstanding that Llama 2 is 'open source' - it is not.. Even assuming the term can be validly applied to a large language model comprising several resources of differenet kinds, Meta is confusing 'open source' with 'resources available to some users under some conditions', two very different things. We've asked them to correct their misstatement."

Very specifically, they state, "Open Source is premised on the understanding that everyone gets to share no matter who you are. The commercial limitation in paragraph 2 of LLAMA COMMUNITY LICENSE AGREEMENT is contrary to that promise in the OSD."

Given this unequivocal statement by the OSI, its clear that they would find WizardCoder (under its former license structure) to fall outside of the proper definition of Open Source.

Its worth noting that the usage restrictions, even for the sake of prohibiting folks from leveraging the AI for harmful purposes, also violates the OSI's definition of open source per the OSD. I'm willing to cede on this point since it seems that AI models are being evalauted under a different definition of Open Source for some reason; that notwithstanding, it is hard to imagine that the OSI will ever agree that restricting commercial use of a model could fall under the definition of 'open source'.

Final Verdict Options

Either the model(s) have restrictive licenes and are not open source per the definition of open source adhered to by the Open Source Initiative (OSI) or they can be re-used for commercial purposes and can fairly be called open source.

As of right now, I'm not sure how anyone could intelligibly discern what licensing structure the WizardCoder model is under, which is problematic since the implications laden in the difference between 'commercial' and 'non-commercial' are significant.

I'm personally advocating for the team to place all models and their relevant assets under the same permissive license for the sake of open source and driving innovation in the AI space, in general.

ChiYeung Law · Answer 1 · Fri Aug 04 2023 16:06:50 GMT+0800 (China Standard Time)

Sorry for the inconsistencies. We are new to the License problems. Our belief is that "open-source LLM belongs to everyone". Thus, the license of WizardCoder will keep the same as StarCoder.