jpcbertoldo / pymdr

Python implementation of Mining Data Records.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pymdr

Python implementation of Mining Data Records.

Reference paper

Liu, B., Grossman, R., & Zhai, Y. (2003). Mining data records in web pages. InProceedings of theninth acm sigkdd international conference on knowledge discovery and data mining(p. 601–606). New York, NY, USA: Association for Computing Machinery.

Installation

The setup.py has not been tested yet and is not safe to use. Please follow the instructions bellow.

Python Version 3.6.9

Instructions

  • Clone or download this repo.

  • Open a terminal in the root directory of the project.

  • Make sure you are using the right python version is 3.6.9.

python3 -V
  • Make sure you have virtualenv installed.
pip install virtualenv==20.0.18
  • Create a virtual environment and install the requirements (replace apt if you are not on ubuntu).
# in cas you already have another virtualenv activated
deactivate  

virtualenv venv -p python3.6
source ./venv/bin/activate
pip install -r requirements/dev.txt
  • Install graphviz.
sudo apt-get install graphviz
  • Add the src module to the PYTHONPATH in the virtualenv.
PTH_FILE="$(pwd)/venv/lib/python3.6/site-packages/src.pth"
touch ${PTH_FILE}
echo "$(pwd)/src/" >> ${PTH_FILE}
deactivate
source ./venv/bin/activate

To use it in the browser

The only tested browser is Google Chrome Version 81.0.4044.92 (Official Build) (64-bit).

Start the API:

# with the terminal open in the root of the project...
./launch-api.sh

Install the extension on Chrome using the developer mode. See instructions on how to do this at the beginning (2nd step) of this tutorial.

Report

A report about this project is available in the root of the repo.

About

Python implementation of Mining Data Records.

License:MIT License


Languages

Language:Jupyter Notebook 86.2%Language:Python 12.7%Language:HTML 0.7%Language:JavaScript 0.4%Language:Shell 0.0%