JasonCeng / Linux-from-scratch-site-spider

Crawl down http://www.linuxfromscratch.org/ Linux from scratch entire site.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Linux from scratch site spider

This project originated from a hobby of a good friend of downloading various websites to the local area. It happened to be practicing python recently, so I promised him to try to crawl down Linux from scratch entire site.

First of all, this is convenient for viewing in a disconnected environment. Secondly, it is also a hobby of collecting. When he told me this idea, I also found it very interesting. And he still has to say, "It would be better if I could support the resuming and updating of the breakpoint." Well, I promise you!

Special statement, this project is not used for any commercial purpose, purely for fun, if you violate any of your rights, please contact me. (cengzhaochuang@stu.csust.edu.cn)

To-do-list

  • 爬取首页,并解析出所有链接
  • 爬取整个站点页面
  • 支持断点续传、定时更新

About

Crawl down http://www.linuxfromscratch.org/ Linux from scratch entire site.


Languages

Language:Python 59.7%Language:HTML 40.3%