Error on Email Harvest from Google Search

Question

Error on Email Harvest from Google Search

ondrejtoral opened this issue 7 years ago · comments

ondrejtoral commented 7 years ago

Please provide the following details.

Host System

OS : Ubuntu 16.04.03 LTS
Python version (python --version) : 2.7.12
Pip version (pip --version) : 9.0.1
Output of pip freeze : [https://gist.github.com/ondrejtoral/bf4a0b3f5120989eab811f084dd96512]

Error Description

I have run python 2.7 Belati.py -c mega.cz (and a couple different addresses) as root. The Google search is blocked and when trying to harvest emails from google I get this error:

[*] Perfoming Email Harvest from Google Search... Error code: 503 [-] Not found or Unavailable. None Traceback (most recent call last): File "Belati.py", line 432, in <module> BelatiApp = Belati() File "Belati.py", line 155, in __init__ self.harvest_email_search(domain, proxy) File "Belati.py", line 323, in harvest_email_search self.db.insert_email_result(self.project_id, util.clean_list_string(harvest_result)) File "/home/trl/Belati/plugins/util.py", line 74, in clean_list_string return str(", ".join(text)) TypeError: can only join an iterable

Aan · Answer 1 · Thu Oct 12 2017 14:01:59 GMT+0800 (China Standard Time)

Hy, maybe your ip has been blocked because it is return 503 error code. Try using proxy, belati support list proxy and single proxy.

Aan · Answer 2 · Thu Oct 12 2017 14:23:03 GMT+0800 (China Standard Time)

I've updated source code for this problem. Please use git pull.
Thanks!

ondrejtoral · Answer 3 · Thu Oct 12 2017 14:42:00 GMT+0800 (China Standard Time)

Hi, thank you for the quick fix! It went well until searching for PDFs:
[*] Searching PDF Document... Error code: 503 Traceback (most recent call last): File "Belati.py", line 430, in <module> BelatiApp = Belati() File "Belati.py", line 165, in __init__ self.harvest_document(domain, proxy) File "Belati.py", line 338, in harvest_document public_doc.init_crawl(domain_name, proxy_address, self.project_id) File "/home/trl/Belati/plugins/harvest_public_document.py", line 52, in init_crawl self.harvest_public_doc(domain, "pdf", proxy_address) File "/home/trl/Belati/plugins/harvest_public_document.py", line 70, in harvest_public_doc data = re.findall(regex, data) File "/usr/lib/python2.7/re.py", line 181, in findall return _compile(pattern, flags).findall(string) TypeError: expected string or buffer

Aan · Answer 4 · Thu Oct 12 2017 14:46:29 GMT+0800 (China Standard Time)

Ah i see, will update soon. Thanks for remindering this issue. Any other problems?

ondrejtoral · Answer 5 · Thu Oct 12 2017 14:55:37 GMT+0800 (China Standard Time)

So far so good, if I find something else, I will post another issue.
Must find some proxies, without google search, the report is very basic.
Thank you for the great work!

Aan · Answer 6 · Thu Oct 12 2017 15:11:58 GMT+0800 (China Standard Time)

Okay. Thanks for your report. I will update and close this issue :)