xiaodin1 / WPCrawler

a web crawler for single WordPress site

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WPCrawler

针对单个WordPress网站的网络爬虫程序

使用的开源类库如下:

Apache HttpComponents 4.3

HTML Parser 2.0

MySQL Connector/J 5.1.27

使用UTF-8编码以记录中文标签

使用XAMPP默认MySQL端口localhost:3306

需要本地XAMPP环境

下一次更新会加入统计每篇文章所使用的标签的功能

可以在我的博客内阅读详细原理:

http://johnhany.net/2013/11/web-crawler-using-java-and-mysql/

(博客空间是新近开通的,如果访问时出现问题烦请告知,我会想办法解决)

=========

a web crawler for single WordPress site

open source projects that I am using:

Apache HttpComponents 4.3

HTML Parser 2.0

MySQL Connector/J 5.1.27

Need XAMPP environment.

The program assume that there is a database called "crawler" in your localhost with port 3306.

Analyzing tags for each article will be added in the next update.

You can read about this in my blog:

http://johnhany.net/2013/11/web-crawler-using-java-and-mysql/

My blog is new and yet unstable. If you have any problems entering my blog, please notify me:)

About

a web crawler for single WordPress site


Languages

Language:Java 100.0%