termsurf / seed.text

Community Literature Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool








@wavebond/seed.text

Community Literature Data




Introduction

The goal of seed.text is to collect structured literature data for general consumption. All of this data will be hosted on chat.surf, a website devoted to literature and languages across the globe, throughout history. If you see something there that needs to be fixed, this is the repo to probably make the change.

For more related open community data projects, see seed.

Data Sources

We collect data from a variety of places both from the web and from physical books and such. Here is a brief overview of some of the data sources we have currently used.

  • ctext.org: This is where many of the Chinese texts come from.
  • biblehub.com: Where we get snippets and links to the Christian bible translations.
  • SATDB: Where some of the Chinese Buddhist texts come from.
  • tipitaka.org: Where the original XML data for the Tripitaka comes from.

Contribute

If you have some structured language data to work with, please open an issue or PR with a link to the content (whether as spreadsheets, json, etc.) so we can add it to the collection.

License

As much as possible of this project is released under the public domain (whatever does not already have other licenses). This is because this "data" (even structured data), legally, can't be "owned", as it is public knowledge. So we simply make it availble for free.

CC0

To the extent possible under law, WaveBond has waived all copyright and related or neighboring rights to seed. This work is published from the United States.

Otherwise, if there are specific datasets which have existing licenses, like the Chinese cedict dataset, which uses the "Creative Commons Attribution-ShareAlike 4.0 International License", then those datasets are governed by that license. We will do our best to keep the license data in tact for projects as we go, but if we make any mistakes or could improve things in this arena, please reach out and let us know how we can get things better.

WaveBond

This is being developed by the folks at WaveBond, a California-based project for helping humanity master information and computation. Find us on Twitter, LinkedIn, and Facebook. Check out our other GitHub projects as well!

About

Community Literature Data