QianyanTech / Image-Downloader

Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not downloading any images

Ajithbalakrishnan opened this issue · comments

`python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.0:1080 apple

Scraping From Google Image Search ...

Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
/home/ajith/miniconda3/lib/python3.7/site-packages/selenium-4.0.0a5-py3.7.egg/selenium/webdriver/remote/webdriver.py:640: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
Find 0 images.

== 0 out of 0 crawled images urls will be used.

Finished.`

I tried with GUI also. But it doesnt work. Please guid me.

@Ajithbalakrishnan I believe you behaved a typo on '127.0.0.0:1080", which should be '127.0.0.1:1080

@sczhengyabin Thanks for your quick comment . But i have tried every combination and i got the same answer.

`python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.1:1080 apple

Scraping From Google Image Search ...

Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
/home/ajith/miniconda3/lib/python3.7/site-packages/selenium-4.0.0a5-py3.7.egg/selenium/webdriver/remote/webdriver.py:640: UserWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
warnings.warn("find_elements_by_* commands are deprecated. Please use find_elements() instead")
Find 0 images.

== 0 out of 0 crawled images urls will be used.

Finished.`

I tried the same with GUI also. But got the same results.

image
@Ajithbalakrishnan I can download images using exact the same args as yours.
It's more likely to be a network issue.
Maybe you network is too slow or proxy server internal error.
From my tests, if my network has issue with google webs, I will get the exact same outputs as what your commented.

@sczhengyabin I have proper network. But am woking on ubuntu with anaconda environment. I hopes that will not be a problem. I installed the requiremnets through pip.

@Ajithbalakrishnan Try using chrome mode. Which you can see visual actions in chrome browser to see where goes wrong.

@sczhengyabin I tried chrome mode in GUI. Please watch the result. Chrome promted for a second. But it went off. I checked the chrome driver also. Versin also same only.
Screenshot from 2020-04-26 21-31-30

@Ajithbalakrishnan no clue yet. Does Bing engine works?

@sczhengyabin Nope. Same result. Chrome is not showing that search results. I checked the internet. I have good network.
Screenshot from 2020-04-27 00-39-32

@sczhengyabin Please share the dependancies and its versions that u have used.

@Ajithbalakrishnan

requests==2.18.4
selenium==3.141.0
PyQt5==5.14.2

generated using pipreqs

Seems to me still a network issue, at least for this project.

To verify, you can setup proxy using 'proxychains', rather than the proxy option in this project.

# config in /etc/proxychains.conf
proxychains python3 image_downloader.py ...

`proxychains python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images --proxy_socks5 127.0.0.1:1080 apple
ProxyChains-3.1 (http://proxychains.sf.net)

Scraping From Google Image Search ...

Keywords: apple
Number: 100
Face Only: False
Safe Mode: False
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=apple&safe=off
|S-chain|-<>-127.0.0.1:1080-<--timeout
|DNS-request| localhost
|S-chain|-<>-127.0.0.1:1080-<--timeout
|DNS-response|: localhost does not exist
|DNS-request| localhost
|S-chain|-<>-127.0.0.1:1080-<--timeout
|DNS-response|: localhost does not exist
`
I am adding my proxychains.config file below.

proxychains.zip

I tried to change the line "socks4 127.0.0.1 9050" in proxychain config file to 127 0 0 1 1080. But no use.

@Ajithbalakrishnan
proxychains conf should be
socks5 127.0.0.1 1080
if you can use proxychains to downloads other things, e.g. apt-get, then it's an issue with Image-Downloader, other wise it's definitely something wrong with your socks5 proxy configuration.

@sczhengyabin Its working now. I made some changes in /etc/proxychains config file.

  1. Strict chain to dynamic chain
  2. added one more line in last socks5 127.0.0.1 9050

Then i have installed Tor,pysocks in my environment.


   sudo apt-get install tor
    pip install PySocks


As the sock5 port has been changed, so command will be

python3 image_downloader.py --engine Google --driver chrome_headless --max-number 100 --output ./images/kerlaflood --proxy_socks5 127.0.0.1:9050 kerlaflood2018

Hopes this might helpful for others. Sorry for wasting your valuable time.

@Ajithbalakrishnan It's ok, as long as the problem is solved.

fwiw I have a similar issue but only with Google. I think the reason is that google shows a "before you continue to google" page - that's what I quickly see in the interactive Chrome option, before it closes.

Using Bing instead works.