Here are some spiders.
All the spiders will inherit from the BaseSpider
in utils.base
, which implements many useful functions, such as automaticly switch the header of request, parse the content by xpath
.
I use mongodb
as the default database and redis
to control the parallel. You can find the setting in config
on each folder.
- 36kr news flashes
This spider crawl the 36kr news flashes, which includes the title, description and other many useful information. It can be the corpus for information extraction. - oxford words
This spider crawl the words from Oxford Learner's Dictionaries for my dear friend Zhang Yu's Plugin for vscode, which can complete the word when you are typing.