renatoosousa / webdata

Web data extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Description of Data

===================

Extractor:

RUN python extractor/study_codes/reverse_index.py to see all the INDEX LISTS

how the data is organized

extractor > study_codes > results: First part with extracted data

extractor > study_codes > results > docs: Second part with pre-processed data separated by document with ID


Classifier:

how the data is organized

Bag of words:

classifier > dataframe > README

Information of the classifiers:

classifier > classifiers > 'algorithm' (the folder of each classifier has a log.txt with its information)

About

Web data extraction


Languages

Language:Python 64.4%Language:Jupyter Notebook 16.0%Language:HTML 10.3%Language:JavaScript 8.2%Language:CSS 1.0%