qzcool / Media-Scrapper

A media scrapper that values simplicity and performance.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Media-Scrapper

A media scrapper that values simplicity and performance. Download the best story under a certain tag automatically. Builded on top of BeautifulSoup4 and Requests.

Functionality

  1. Download all media of a story (url) in the folder under the story name.
  2. Create a list of stories by tag (topic), download all media of stories (urls)

Sample Story Media

Supported Media Sources

  1. 福利秀
  2. 91自拍论坛

Deployment Instruction

  1. Download the repository to the local path, where the media will be saved.
  2. Install Packages Dependencies:
  • tqdm: pip install tqdm
  • fake-useragent: pip install fake-useragent
  • BeautifulSoup4: pip install beautifulsoup4
  • Requests: pip install requests
  1. Open the MediaScrapper.ipynb file with Jupyter Notebook.
  2. Select the media source (code block).
  3. Alter the url or tag information for your need.
  4. Run to start scrapping media.

Issues

  1. Due to the high volume of traffic at night for the media sources, we suggest you to run the MediaScrapper other time.

Disclaimer

Sharing allergic (adult) contents might be against the law. This media scrapper is purely for personal academic purpose and therefore not obliged to any legal issues related with non-personal, non-academic purposes.

About

A media scrapper that values simplicity and performance.

License:MIT License


Languages

Language:Jupyter Notebook 100.0%