banglakit / corpus-builder

toolkit for compiling corpus from various sources

Home Page:https://github.com/banglakit/corpus-builder/wiki/Status

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

banglakit/corpus-builder

Having a large enough set of text is essential for NLP tasks; this tool is designed for the sole purpose of building large collection of text documents from the web.

A practical understanding of Python and Scrapy is essential for using the tool.

Example Usage

scrapy crawl bangladesh_pratidin -a start_date='2016-06-01' -a end_date='2016-06-05' -o test3.csv

About

toolkit for compiling corpus from various sources

https://github.com/banglakit/corpus-builder/wiki/Status

License:MIT License


Languages

Language:Python 100.0%