Here are some spiders.
All the spiders will inherit from the
utils.base, which implements many useful functions, such as automaticly switch the header of request, parse the content by
mongodb as the default database and
redis to control the parallel. You can find the setting in
config on each folder.
- 36kr news flashes
This spider crawl the 36kr news flashes, which includes the title, description and other many useful information. It can be the corpus for information extraction.
- oxford words
This spider crawl the words from Oxford Learner's Dictionaries for my dear friend Zhang Yu's Plugin for vscode, which can complete the word when you are typing.