m8sec / pymeta

Utility to download and extract document metadata from an organization. This technique can be used to identify: domains, usernames, software/version numbers and naming conventions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error when found files

dabula-s opened this issue · comments

[*] Starting web search
[*] Extension  |  Number of New Links Found  |  Search URL
[*] pdf :  0  https://www.google.com/search?q=site:opu.ua+filetype:pdf&num=100
[*] pdf : 10  http://www.bing.com/search?q=site:opu.ua%20filetype:pdf
[*] pdf : 10  http://www.bing.com/search?q=site:opu.ua%20filetype:pdf&first=11
[*] pdf :  9  http://www.bing.com/search?q=site:opu.ua%20filetype:pdf&first=21
[*] pdf : 10  http://www.bing.com/search?q=site:opu.ua%20filetype:pdf&first=31
[*] pdf : 10  http://www.bing.com/search?q=site:opu.ua%20filetype:pdf&first=41
[*] pdf :  1  http://www.bing.com/search?q=site:opu.ua%20filetype:pdf&first=51
[*] xls :  0  https://www.google.com/search?q=site:opu.ua+filetype:xls&num=100
[*] xls :  6  http://www.bing.com/search?q=site:opu.ua%20filetype:xls
[*] xls :  0  http://www.bing.com/search?q=site:opu.ua%20filetype:xls&first=7
[*] xlsx:  0  https://www.google.com/search?q=site:opu.ua+filetype:xlsx&num=100
[*] xlsx:  0  http://www.bing.com/search?q=site:opu.ua%20filetype:xlsx
[*] doc :  0  https://www.google.com/search?q=site:opu.ua+filetype:doc&num=100
[*] doc : 20  http://www.bing.com/search?q=site:opu.ua%20filetype:doc
[*] doc : 20  http://www.bing.com/search?q=site:opu.ua%20filetype:doc&first=21
[*] doc : 10  http://www.bing.com/search?q=site:opu.ua%20filetype:doc&first=41
[*] docx:  0  https://www.google.com/search?q=site:opu.ua+filetype:docx&num=100
[*] docx:  0  http://www.bing.com/search?q=site:opu.ua%20filetype:docx
[*] ppt :  0  https://www.google.com/search?q=site:opu.ua+filetype:ppt&num=100
[*] ppt :  4  http://www.bing.com/search?q=site:opu.ua%20filetype:ppt
[*] ppt :  0  http://www.bing.com/search?q=site:opu.ua%20filetype:ppt&first=5
[*] pptx:  0  https://www.google.com/search?q=site:opu.ua+filetype:pptx&num=100
[*] pptx:  0  http://www.bing.com/search?q=site:opu.ua%20filetype:pptx
[*] Setting up folder for downloads
[*] Downloading files from the internet
[*] Extracting Metadata from folder: ./opu._meta/, to ./pymeta_opu..csv
Traceback (most recent call last):
  File "pymeta.py", line 226, in <module>
    main(args)
  File "pymeta.py", line 199, in main
    pyme.extract_csv(args.file_dir, outfile)
  File "pymeta.py", line 112, in extract_csv
    meta = getoutput("exiftool -t '{}'".format(file_dir + f)).splitlines()
  File "/usr/local/lib/python3.7/subprocess.py", line 605, in getoutput
    return getstatusoutput(cmd)[1]
  File "/usr/local/lib/python3.7/subprocess.py", line 586, in getstatusoutput
    data = check_output(cmd, shell=True, text=True, stderr=STDOUT)
  File "/usr/local/lib/python3.7/subprocess.py", line 395, in check_output
    **kwargs).stdout
  File "/usr/local/lib/python3.7/subprocess.py", line 474, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/local/lib/python3.7/subprocess.py", line 926, in communicate
    stdout = self.stdout.read()
  File "/usr/local/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 911: invalid continuation byte

python 3.7, 3.6
os slim-stretch, alpine3.8

commented

Hi @SeriyVol4ishe,

Thanks for reporting the issue with as much detail! The latest pull request by @fang0654 has fixed this issue. Error handling has been added and the program will no longer close when an invalid character is found.

Please let me know if this does not fix your issue. I will continue looking into this behind the scenes to see if any meta-data can be salvaged when these errors occur!

-m8r0wn