Image Crawler

Crawl images from baidu, bing, google (BBG)
according to keyword using browser.

1. Install Dependencies

1.1 Create virtual environment using conda.
Only test in Python 3.6 and 3.7.

conda create --name crawler python=3.7
source activate crawler

1.2 Download browser driver for selenium.
Find and download the driver of the browser you use.
https://www.seleniumhq.org/download/

1.3 Add the driver's folder's path to SYSTEM PATH.

2. Get Code

Clone the repo and install packages.

git clone https://github.com/quqixun/ImageCrawler.git
cd crawler
pip install -r requires.txt

Run test code to crawl my favourite pandas from BBG.

cd src
python panda.py

3. Explaination

ImageCrawler: obtain HTTP path of desired images;
ImageDownloader: download images to local directory.

import os

from image_crawler import ImageCrawler
from image_downloader import ImageDownloader

4 parameters for ImageCrawler and ImageDownloader:

keyword: you know, the keyword;
n_scroll: number of scrolling in brower;
link_save_dir: holds all links of images;
image_save_dir: where to find all cute pandas.

n_scroll = 5
keyword = 'panda'

link_save_dir = os.path.join('../data/links', keyword)
image_save_dir = os.path.join('../data/images', keyword)

Crawl images' links using Baidu

engine = 'baidu'
baidu_links_name = 'baidu_links.csv'

baidu_ic = ImageCrawler(engine)
baidu_ic.run(keyword, n_scroll)
baidu_ic.save_links(link_save_dir, baidu_links_name)

Crawl images' links using Bing

engine = 'bing'
bing_links_name = 'bing_links.csv'

bing_ic = ImageCrawler(engine)
bing_ic.run(keyword, n_scroll)
bing_ic.save_links(link_save_dir, bing_links_name)

Crawl images' links using Google

engine = 'google'
google_links_name = 'google_links.csv'

google_ic = ImageCrawler(engine)
google_ic.run(keyword, n_scroll)
google_ic.save_links(link_save_dir, google_links_name)

Download images

ider = ImageDownloader(link_save_dir)
ider.run(image_save_dir)

4. Panda

Here is a panda we found.

quqixun / ImageCrawler

Image Crawler

1. Install Dependencies

2. Get Code

3. Explaination

4. Panda

About

Languages