ahurriyetoglu / portuguese-nlp

Nlp work on Brazil Portuguese newswire text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

portuguese-nlp

Nlp work on Brazil Portuguese newswire text

You can browse the dataset online and see annotations on drive

We have x number of newswire articles collected between years 1994-2016. After preprocessing the dataset, since the articles are in html format, we first clean the tags and rename all files such as:

folca/data/2005/01/01/19.html --> folca/parsed-data/2005_01_01_19.html

and collect them all in one folder.

More

About

Nlp work on Brazil Portuguese newswire text


Languages

Language:Python 93.4%Language:Shell 6.6%