LUO LANGUAGE DATASET FOR NER
About
This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi. I did this as part of the Masakhane NER project
NLP, NER , Masakhane
Table of Contents
About Dataset
The sentences were obtained from Ramogi FM website: https://rmsradio.co.ke/brands/ramogi-fm/
Dates published: 1/9/2018 - 10/3/2021. Get the most updated information from README.txt
Categories
Get the most updated information from README.txt
Repo Structure
This repo contains 3 main files of interest.
1. README.md
This file
2. README.txt
Contains statistical description of the data- News domains, publication and collection dates
3. LUO.txt
Contains a cleaned compilation the text
The rest are just files used in the collection and cleaning process.
Clone
- Clone this repo to your local machine using
https://github.com/Pogayo/Luo-News-Dataset
Contributing
To get started...
Step 1
-
Option 1
π΄ Fork this repo!
-
Option 2
π― Clone this repo to your local machine usinghttps://github.com/Pogayo/Luo-News-Dataset
Step 2
- HACK AWAY!
π¨ π¨ π¨
Step 3
π Create a new pull request
Team
- We are a small team. Join us and let's put Luo on the NLP Map together!
FAQ
- How do I do collect the sentences?
- Go to the Ramogi Website . Typically, you will only find the latest news.
- If you have exhausted the latest news, go to the web archive to get links of earlier news.
Support me
I am in the process of setting up a wallet. Feel free to reach out to me so that I can give you other payment details in the meantime.
License
This work is licensed under a Creative Commons Attribution 4.0 International License.