wuyifan18 / spider

裁判文书网爬虫

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A spider for China Judgements Online

This project is no longer maintained and for reference only

It is only used for personal study and technical exchange, and cannot be used for commercial purposes.

Overview

This is a spider for **裁判文书网.

Features

  • Support IP proxy
  • Support multiple processes
  • Support full crawling
  • Divide data according to decision time, region and court

Run

python spider.py -num_processes 1 -start_time 2016-1-2 -end_time 2016-1-2

Results

  • raw data

image

  • processed data

image

推荐一个付费好用的代理SmartProxy: 代理IP池项目,它主打1亿真实住宅IP资源,专业海外http代理商,千万级优质资源,覆盖全球城市,高匿稳定提供100%原生住宅IP,支持社交账户、电商平台、网络数据收集等服务。 提供API和账密提取使用方式,动态和静态住宅代理均有,大部分是真人住宅IP,成功率很赞,本人测试用过之后感觉很不错。 现在付费套餐选择多样,春季价格很优惠,动态住宅代理只要65折!需要高质量代理IP的可以注册后联系客服购买,比直接购买会优惠一点。

官网链接:https://www.smartproxy.cn/

专属注册链接:https://www.smartproxy.cn/regist

About

裁判文书网爬虫

License:MIT License


Languages

Language:JavaScript 83.6%Language:Python 16.4%