ti-a-go / data

NLP Sandbox

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data

This project is used to store the data to be used to test open information extraction methods.

The data source is initially Wikipedia.

What information do I want to extract?

Information about black history.

Application

Search Wikipedia page based on user input save the raw text in the wikipedia/pages/ folder. - Threat Wikipedia search desambiguate problem

Process the raw text using a Spacy model

Save the main data in a folder named after the current date in csv files

About

NLP Sandbox

License:Apache License 2.0