zhujiaqi / GC_Crawler

GuitarChina Crawler written in Python (Not in development)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GuitarChina BBS Crawler

Author: Zakk Zhu

Repository: github/zhujiaqi/GC_Crawler

Requirements:

  • Python 2.7
  • Sqlite3
  • Gmail account (optional)

Installation:

  1. initialize gc.db
  2. run python crawler.py

Features:

  • Craw the pages and save them locally.
  • Stripe the styles so you got what you need...in a organized way.

You are free to use this code snipplet as long as its not for commercial proposes and within the restrictions of the laws.

It comes with absolutely no guarantee so if anything didn't work as expected, bail out. (or fix it)

However, you are welcome to make it better, some nice to have improvements:

  1. Kill the captcha and make the login part working. (So you can see & save images)

  2. Option for dumping the DB.

  3. Daemon process to run this remotely and by demands.

  4. ...

About

GuitarChina Crawler written in Python (Not in development)


Languages

Language:Python 100.0%