QianyanTech / Image-Downloader

Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

linux下代理服务器下载google图片有问题

yfzmk2013 opened this issue · comments

目前,在linux系统个下,采用代理服务器的方式下载google图片,通过命令行不能够正确运行。

@yfzmk2013 请说具体一点。。。
比如系统环境,如何运行的代码,是否修改过代码,出错的现象是什么,报了什么错。。。

Traceback (most recent call last):
File "image_downloader_google.py", line 100, in
browser="phantomjs")
File "/home/yanhao/project/DengHong_Git/Image-Downloader/crawler.py", line 254, in crawl_image_urls
service_args=phantomjs_args, desired_capabilities=dcap)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 58, in init
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 92, in init
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response
raise exception_class(value)
selenium.common.exceptions.WebDriverException: Message:

<title>502 - No server or forwarder data received (Privoxy@localhost)</title> <style type="text/css">

/*

  • CSS for Privoxy CGI and script output
  • Id: cgi-style.css,v
    */

/*

  • General rules: Font, Color, Headings, Margins, Links
    */
    body,td,th { font-family: arial, helvetica, helv, sans-serif; }
    body { background-color: #ffffff; color: #000000; }

h1 { font-size: 140%; margin: 0px; }
h2 { font-size: 120%; margin: 0px; }
h3 { font-size: 110%; margin: 0px; }

p,pre { margin-left: 15px; }
li { margin: 2px 15px; }
dl { margin: 2px 15px; }

a:link { color: #0000dd; text-decoration: none; }
a:visited { color: #330099; text-decoration: none; }
a:active { color: #3333ff; text-decoration: none; }

/*

  • Boxen as Table elements:
    */
    td.title { border: solid black 1px; background-color: #dddddd; }
    td.box { border: solid black 1px; background-color: #eeeeee; }
    td.info { border: solid black 1px; background-color: #ccccff; }
    td.warning { border: solid black 1px; background-color: #ffdddd; }

/*

  • Special Table Boxen: for nesting, naked container and for
  • the Status field in CGI Output:
    */
    td.wrapbox { border: solid black 1px; padding: 5px; }
    td.container { padding: 0px; }
    td.status { border: solid black 1px; background-color: #ff0000; color: #ffffff; font-size: 300%; font-weight: bolder; }

/*

  • Same Boxen as
    s:
    */
    div.title { border: solid black 1px; background-color: #dddddd; margin: 20px; padding: 20px; }
    div.box { border: solid black 1px; background-color: #eeeeee; margin: 20px; padding: 20px; }
    div.info { border: solid black 1px; background-color: #ccccff; margin: 20px; padding: 20px; }
    div.warning { border: solid black 1px; background-color: #ffdddd; margin: 20px; padding: 20px; }
    div.wrapbox { border: solid black 1px; margin: 20px; padding: 5px; }

/*

  • Bold definitions in
    s, grey BG for table headings, transparent (no-bordered) table
    */
    dt { font-weight: bold; }
    th { background-color: #dddddd; }
    table.transparent { border-style: none}

/*

  • Special purpose paragraphs: Small for page footers,
  • Important for quoting wrong or dangerous examples,
  • Whiteframed for the toggle?mini=y CGI
    */
    p.small { font-size: 10px; margin: 0px; }
    p.important { border: solid black 1px; background-color: #ffdddd; font-weight: bold; padding: 2px; }
    p.whiteframed { margin: 5px; padding: 5px; border: solid black 1px; text-align: center; background-color: #eeeeee; }

/*

  • Links as buttons:
    */

td.buttons {
padding: 2px;
}

a.cmd, td.indentbuttons a, td.buttons a {
white-space: nowrap;
width: auto;
padding: 2px;
background-color: #dddddd;
color: #000000;
text-decoration: none;
border-top: 1px solid #ffffff;
border-left: 1px solid #ffffff;
border-bottom: 1px solid #000000;
border-right: 1px solid #000000;
}
a.cmd:hover, td.indentbuttons a:hover, td.buttons a:hover {
background-color: #eeeeee;
}
a.cmd:active, td.indentbuttons a:active, td.buttons a:active {
border-top: 1px solid #000000;
border-left: 1px solid #000000;
border-bottom: 1px solid #ffffff;
border-right: 1px solid #ffffff;
}

/*

  • Special red emphasis:
    */
    em.warning, strong.warning { color: #ff0000 }

/*

  • In show-status we use tables directly behind headlines
  • and for some reason or another the headlines are set to
  • "margin:0" and leave the tables no air to breath.
  • A proper fix would be to replace or remove the "margin:0",
  • but as this affects every cgi page we do it another time
  • and use this workaround until then.
    */
    .box table { margin-top: 1em; }

/*

  • Let the URL and pattern input fields scale with the browser
  • width and try to prevent vertical scroll bars if the width
  • is less than 80 characters.
    */
    input.url, input.pattern { width: 95%; }
</style>
502
    <h1>
      This is <a href="http://www.privoxy.org/">Privoxy</a> 3.0.24 on localhost (127.0.0.1), port 8118<!-- @if-can-toggle-start -->,
      enabled<!-- if-can-toggle-end@ -->
    </h1>

  </td>
</tr>
<tr>
  <td class="warning" colspan=2>
    <h2>No server or forwarder data received</h2>
      <p>Your request for <a href="http://127.0.0.1:42923/wd/hub/session"><b>http://127.0.0.1:42923/wd/hub/session</b></a>
      could not be fulfilled, because the connection to <b>127.0.0.1</b> (127.0.0.1) has been closed
      before Privoxy received any data for this request.
      </p>
      <p>This is often a temporary failure, so you might just
        <a href="http://127.0.0.1:42923/wd/hub/session">try again</a>.
     </p>
     <p>
      If you get this message very often, consider disabling
      <a href="http://config.privoxy.org/user-manual/config.html#CONNECTION-SHARING">connection-sharing</a>
      (which should be off by default). If that doesn't help, you may have to additionally
      disable support for connection keep-alive by setting
      <a href="http://config.privoxy.org/user-manual/config.html#KEEP-ALIVE-TIMEOUT">keep-alive-timeout</a>
      to 0.
     </p>
  </td>
</tr>

<tr>
  <td class="box" colspan="2">
    <h2>More Privoxy:</h2>
    <ul><li><a href="http://config.privoxy.org/">Privoxy main page</a></li><li><a href="http://config.privoxy.org/show-status">View &amp; change the current configuration</a></li><li><a href="http://config.privoxy.org/show-version">View the source code version numbers</a></li><li><a href="http://config.privoxy.org/show-request">View the request headers</a></li><li><a href="http://config.privoxy.org/show-url-info">Look up which actions apply to a URL and why</a></li><li><a href="http://config.privoxy.org/user-manual/">Documentation</a></li></ul>
  </td>
</tr>

<tr>
  <td class="info" colspan="2">

   <h2>Support and Service:</h2>
    <p>
      The Privoxy Team values your feedback. To provide you with the best support,
      we ask that you:
    </p>
    <ul>
      <li>
        use the <a href="http://sourceforge.net/tracker/?group_id=11118&amp;atid=211118">Support Tracker</a>
        if you need help.
      </li>
      <li>
        submit ads and configuration related problems with the actions files through the
        <a href="http://sourceforge.net/tracker/?group_id=11118&amp;atid=460288">Actionsfile Feedback Tracker</a>.
      </li>
      <li>
        submit bugs only through the
        <a href="http://sourceforge.net/tracker/?group_id=11118&amp;atid=111118">Bug Tracker</a>.
        Please make sure that the bug has not been submitted yet.
      </li>
      <li>
        submit feature requests only through the
        <a href="http://sourceforge.net/tracker/?atid=361118&amp;group_id=11118&amp;func=browse">Feature
        Request Tracker</a>.
      </li>
      <li>
        read the <a title="Contacting the developers, Bug Reporting and Feature Requests"
         href="http://config.privoxy.org/user-manual/contact.html">instructions in the User Manual</a>
        to make sure your request contains all the information we need.
      </li>
    </ul>
    <p>
     If you want to support the Privoxy Team, please have a look at the FAQ to learn how to
     <a href="http://www.privoxy.org/faq/general.html#PARTICIPATE">participate</a>
     or to <a href="http://www.privoxy.org/faq/general.html#DONATE">donate</a>.
    </p>

  </td>
</tr>

Ubuntu 16.04 系统

1
我刚测试了一下,可以用的。
你参考我这个调用方式试试呢?
@yfzmk2013

命令 :curl ip.gs
Current IP / 当前 IP: 45.62.105.15
ISP / 运营商: it7.net
City / 城市: Los Angeles California
Country / 国家: United States
IP.GS is now IP.SB, please visit https://ip.sb/ for more IP information, ip.gs will only use for curl purpose. / IP.GS 已更新至 IP.SB 请访问 https://ip.sb/ 获取更多信息, ip.gs 域名仅作 curl 使用
Please join Telegram group https://t.me/sbfans if you have any issues. / 如有问题,请加入 Telegram 群 https://t.me/sbfans

但我ping www.google.com ping不通,我的网页可以上google。不知道你边是不是已经让命令行可以登上google

@yfzmk2013
我确实是在路由器上翻墙的,不过这应该没影响。如果参数给的socks5代理不对,也是不能正常运行的。
image

并且我以前开发这个程序的时候,也是和你同样的条件下测试的,不会影响。

你说下具体的python版本、使用的库的版本,以及phantomjs的版本,我看看能不能复现出来.

@sczhengyabin
你好
我的调用函数 方式是 :
name_st=‘里皮’
crawled_urls = crawler.crawl_image_urls(keywords=name_st,
engine='Google', max_number=10000,
face_only=False, safe_mode=True,
proxy_type="socks5", proxy="127.0.0.1:1080",
browser="phantomjs")

python 版本 Python 3.5.2 phantomjs 2.1.1

报错:

keywords: 里皮
Number: 10000
Face Only: False
Safe Mode: True
Query URL: https://www.google.com/search?tbm=isch&hl=en&q=%E9%87%8C%E7%9A%AE&safe=on
Traceback (most recent call last):
File "image_downloader_google.py", line 100, in
browser="phantomjs")
File "/home/yanhao/project/DengHong_Git/Image-Downloader/crawler.py", line 254, in crawl_image_urls
service_args=phantomjs_args, desired_capabilities=dcap)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 58, in init
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 92, in init
self.start_session(desired_capabilities, browser_profile)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response
raise exception_class(value)
selenium.common.exceptions.WebDriverException: Message:

<title>502 - No server or forwarder data received (Privoxy@localhost)</title> <style type="text/css">

/*

  • CSS for Privoxy CGI and script output
  • Id: cgi-style.css,v
    */

/*

  • General rules: Font, Color, Headings, Margins, Links
    */
    body,td,th { font-family: arial, helvetica, helv, sans-serif; }
    body { background-color: #ffffff; color: #000000; }

h1 { font-size: 140%; margin: 0px; }
h2 { font-size: 120%; margin: 0px; }
h3 { font-size: 110%; margin: 0px; }

p,pre { margin-left: 15px; }
li { margin: 2px 15px; }
dl { margin: 2px 15px; }

a:link { color: #0000dd; text-decoration: none; }
a:visited { color: #330099; text-decoration: none; }
a:active { color: #3333ff; text-decoration: none; }

/*

  • Boxen as Table elements:
    */
    td.title { border: solid black 1px; background-color: #dddddd; }
    td.box { border: solid black 1px; background-color: #eeeeee; }
    td.info { border: solid black 1px; background-color: #ccccff; }
    td.warning { border: solid black 1px; background-color: #ffdddd; }

/*

  • Special Table Boxen: for nesting, naked container and for
  • the Status field in CGI Output:
    */
    td.wrapbox { border: solid black 1px; padding: 5px; }
    td.container { padding: 0px; }
    td.status { border: solid black 1px; background-color: #ff0000; color: #ffffff; font-size: 300%; font-weight: bolder; }

/*

  • Same Boxen as
    s:
    */
    div.title { border: solid black 1px; background-color: #dddddd; margin: 20px; padding: 20px; }
    div.box { border: solid black 1px; background-color: #eeeeee; margin: 20px; padding: 20px; }
    div.info { border: solid black 1px; background-color: #ccccff; margin: 20px; padding: 20px; }
    div.warning { border: solid black 1px; background-color: #ffdddd; margin: 20px; padding: 20px; }
    div.wrapbox { border: solid black 1px; margin: 20px; padding: 5px; }

/*

  • Bold definitions in
    s, grey BG for table headings, transparent (no-bordered) table
    */
    dt { font-weight: bold; }
    th { background-color: #dddddd; }
    table.transparent { border-style: none}

/*

  • Special purpose paragraphs: Small for page footers,
  • Important for quoting wrong or dangerous examples,
  • Whiteframed for the toggle?mini=y CGI
    */
    p.small { font-size: 10px; margin: 0px; }
    p.important { border: solid black 1px; background-color: #ffdddd; font-weight: bold; padding: 2px; }
    p.whiteframed { margin: 5px; padding: 5px; border: solid black 1px; text-align: center; background-color: #eeeeee; }

/*

  • Links as buttons:
    */

td.buttons {
padding: 2px;
}

a.cmd, td.indentbuttons a, td.buttons a {
white-space: nowrap;
width: auto;
padding: 2px;
background-color: #dddddd;
color: #000000;
text-decoration: none;
border-top: 1px solid #ffffff;
border-left: 1px solid #ffffff;
border-bottom: 1px solid #000000;
border-right: 1px solid #000000;
}
a.cmd:hover, td.indentbuttons a:hover, td.buttons a:hover {
background-color: #eeeeee;
}
a.cmd:active, td.indentbuttons a:active, td.buttons a:active {
border-top: 1px solid #000000;
border-left: 1px solid #000000;
border-bottom: 1px solid #ffffff;
border-right: 1px solid #ffffff;
}

/*

  • Special red emphasis:
    */
    em.warning, strong.warning { color: #ff0000 }

/*

  • In show-status we use tables directly behind headlines
  • and for some reason or another the headlines are set to
  • "margin:0" and leave the tables no air to breath.
  • A proper fix would be to replace or remove the "margin:0",
  • but as this affects every cgi page we do it another time
  • and use this workaround until then.
    */
    .box table { margin-top: 1em; }

/*

  • Let the URL and pattern input fields scale with the browser
  • width and try to prevent vertical scroll bars if the width
  • is less than 80 characters.
    */
    input.url, input.pattern { width: 95%; }
</style>
502
    <h1>
      This is <a href="http://www.privoxy.org/">Privoxy</a> 3.0.24 on localhost (127.0.0.1), port 8118<!-- @if-can-toggle-start -->,
      enabled<!-- if-can-toggle-end@ -->
    </h1>

  </td>
</tr>
<tr>
  <td class="warning" colspan=2>
    <h2>No server or forwarder data received</h2>
      <p>Your request for <a href="http://127.0.0.1:48599/wd/hub/session"><b>http://127.0.0.1:48599/wd/hub/session</b></a>
      could not be fulfilled, because the connection to <b>127.0.0.1</b> (127.0.0.1) has been closed
      before Privoxy received any data for this request.
      </p>
      <p>This is often a temporary failure, so you might just
        <a href="http://127.0.0.1:48599/wd/hub/session">try again</a>.
     </p>
     <p>
      If you get this message very often, consider disabling
      <a href="http://config.privoxy.org/user-manual/config.html#CONNECTION-SHARING">connection-sharing</a>
      (which should be off by default). If that doesn't help, you may have to additionally
      disable support for connection keep-alive by setting
      <a href="http://config.privoxy.org/user-manual/config.html#KEEP-ALIVE-TIMEOUT">keep-alive-timeout</a>
      to 0.
     </p>
  </td>
</tr>

<tr>
  <td class="box" colspan="2">
    <h2>More Privoxy:</h2>
    <ul><li><a href="http://config.privoxy.org/">Privoxy main page</a></li><li><a href="http://config.privoxy.org/show-status">View &amp; change the current configuration</a></li><li><a href="http://config.privoxy.org/show-version">View the source code version numbers</a></li><li><a href="http://config.privoxy.org/show-request">View the request headers</a></li><li><a href="http://config.privoxy.org/show-url-info">Look up which actions apply to a URL and why</a></li><li><a href="http://config.privoxy.org/user-manual/">Documentation</a></li></ul>
  </td>
</tr>

<tr>
  <td class="info" colspan="2">

   <h2>Support and Service:</h2>
    <p>
      The Privoxy Team values your feedback. To provide you with the best support,
      we ask that you:
    </p>
    <ul>
      <li>
        use the <a href="http://sourceforge.net/tracker/?group_id=11118&amp;atid=211118">Support Tracker</a>
        if you need help.
      </li>
      <li>
        submit ads and configuration related problems with the actions files through the
        <a href="http://sourceforge.net/tracker/?group_id=11118&amp;atid=460288">Actionsfile Feedback Tracker</a>.
      </li>
      <li>
        submit bugs only through the
        <a href="http://sourceforge.net/tracker/?group_id=11118&amp;atid=111118">Bug Tracker</a>.
        Please make sure that the bug has not been submitted yet.
      </li>
      <li>
        submit feature requests only through the
        <a href="http://sourceforge.net/tracker/?atid=361118&amp;group_id=11118&amp;func=browse">Feature
        Request Tracker</a>.
      </li>
      <li>
        read the <a title="Contacting the developers, Bug Reporting and Feature Requests"
         href="http://config.privoxy.org/user-manual/contact.html">instructions in the User Manual</a>
        to make sure your request contains all the information we need.
      </li>
    </ul>
    <p>
     If you want to support the Privoxy Team, please have a look at the FAQ to learn how to
     <a href="http://www.privoxy.org/faq/general.html#PARTICIPATE">participate</a>
     or to <a href="http://www.privoxy.org/faq/general.html#DONATE">donate</a>.
    </p>

  </td>
</tr>

@yfzmk2013 Sorry,我重新见了一个virtualenv来测试,依然是没问题。搜了一下报错,有可能是代理的问题。
不知道你的SS用啥啥软件,我试过本地的sslocal开的,没问题。路由器上的,也没问题,windows虚拟机里面开的SS代理,也没问题。

我在win10也遇到了相同的问题,连接vpn以后运行代码一直出现selenium.common.exceptions.WebDriverException: Message: 这个错误。弄了很久,起初以为是网页代理,因为我ping不通google。最终发现代理有很多的模式,查了一下区别,将全局代理改为PAC代理后,程序可以正常的运行。