thishanr / language-resources

Datasets and tools for basic natural language processing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Language Resources and Tools

Build Status

Datasets and scripts for basic natural language and speech processing.

This is not an official Google product.

Natural Languages

Directory Language Available
af Afrikaans
bn Bengali / Bangla
hi_ur Hindi & Urdu
is Icelandic
jv Javanese
lo Lao
my Burmese / Myanmar
si Sinhala
xh Xhosa
zu Zulu

Tools

We are including a few tools for working with the natural language datasets. These tools are written in C++ and Python and are built with Bazel. To compile and use these tools, install a recent version of Bazel (minimally Bazel release 0.2.0 is required).

License

Unless otherwise noted, all original files are licensed under an Apache License, Version 2.0.

Where specifically noted, some datasets are licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

The directory third_party/ contains third-party works, which we are including under the respective licenses of the upstream projects. See third_party/README.md for further details.

About

Datasets and tools for basic natural language processing.

License:Apache License 2.0


Languages

Language:Scheme 39.8%Language:C++ 29.8%Language:Python 21.2%Language:Java 7.2%Language:Shell 1.0%Language:C 0.5%Language:Protocol Buffer 0.3%Language:Makefile 0.2%