xyla-io / raspador

A Pythonic web scraping engine that can be used to create "bots" on top of.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Status

raspador

A Xyla scraper.

Install

Install Visual Studio Code

Download and install Microsoft Visual Studio Code from https://code.visualstudio.com/download

Open the app and select the terminal tab in the bottom pane to run terminal commands for the remaining installation steps.

Install git

Git is a version control system used to manage development of the Raspador codebase.

OS X

Install Xcode using the App Store app, and open the Xcode app to install the command line tools.

To check that Git is installed, run this terminal command

which git
# the path to the git executable should be printed
# if nothing is printed, git is not installed

# if git is installed clone the Raspador repo
git clone https://github.com/xyla-io/raspador.git

Windows

Download the Git for Windows Setup from https://git-scm.com/download/win and install git.

git clone https://github.com/xyla-io/raspador.git

Install Python

Raspador is written in Python and requires Python 3 to be installed.

OS X

Install homebrew

Homebrew is a package manager for OS X, similar to a free, command-line app store (See https://brew.sh/).

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

With homebrew, install Python 3.6.1

brew install python3
brew switch python3 3.6.1

Windows

Download and install Python 3.6.1 from https://www.python.org/downloads/windows/

Install geckdriver

geckodriver allows the selenium python package to drive Firefox.

Install Python virtual environment

Create a virtual Python environment for running Raspador.

# in the raspador root directory
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
cd development_packages/data_layer/packages/mysql-connector-python-2.1.7
python setup.py install
cd ../..
python setup.py develop
cd ..
deactivate

Create a Firefox profile

Create a user profile in Firefox for the scraper to use.

Run

Open the raspador root directory in Visual Studio Code and Run Raspador from the terminal.

source .venv/bin/activate
python main.py <CONFIGURATION> <STARTDATE> <ENDDATE>

Docker

Install docker

apt-get update
apt-get install docker.io
# add permissions for the user who will run docker images
usermod -a -G docker <USER>

Build docker image

# in the project root
docker build -t raspador .

Run with docker

docker run --rm --privileged -p 4000:4000 -it raspador bash /usr/src/app/run.sh --help

About

A Pythonic web scraping engine that can be used to create "bots" on top of.

License:MIT License


Languages

Language:Python 92.4%Language:Shell 3.0%Language:HTML 2.3%Language:Dockerfile 2.2%