luo4neck / WebSpider

Python Web Spider // Python网络爬虫

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a simple Python webspider, could collect and store all watched movies of a specified douban.com user into a csv file. The input should be the link of douban movie first page of a user.

这是一个简单的Python网络爬虫,可以采集指定豆瓣用户所有看过的电影并处存进一个csv文件。爬虫的输入是豆瓣用户电影首页地址。

Test input: $ Make test

In order to avoid IP banning, it takes about 40 minutes to finish the test.

测试输入: $ Make test

为了防止IP封禁,完成测试大概需要40分钟。

Totally Python code, used library: urllib2, bs4, time, re, csv, sys.

Python代码,涉及库:urllib2,bs4,time,re,csv,sys。

Code was wrotten in March/2015, Dublin Ireland.

代码于2015年3月,爱尔兰都柏林。

About

Python Web Spider // Python网络爬虫


Languages

Language:Python 95.3%Language:Makefile 4.7%