a-lazy-cat / NovelDownload

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a simple application of Python web scraping technology. It uses Python to scrape novels from a download website. The spyder file employs single-threaded scraping technology, which is slower but more server-friendly, avoiding excessive stress on the website. The Spyder_multiProcessing_multiThreading.ipynb file uses both multiprocessing and multithreading for scraping. However, this causes the chapters of the novels to be scraped in a disordered sequence. Thus, an upgraded version utilizes a mark-scrap-sort method to ensure the order, and this method was used to scrape two novels as examples, '圣墟' and '逆天邪神', storing the results in the 'download' folder.

这是一个简单的python爬虫技术应用。使用python爬取下载网站上的小说。spyder文件采用了单线程技术爬虫,速度较慢但对网站服务器友好,不会造成太大压力。Spyder_multiProcessing_multiThreading.ipynb分别采用了多进程和多线程进行了爬虫。但这会导致爬取下来的小说章节乱序,所以升级版又采用了标记-爬取-排序的方法确保顺序,并用这个方法爬取了《圣墟》《逆天邪神》两本小说,并将结果存储在download文件夹下

About


Languages

Language:Jupyter Notebook 100.0%