recrm / ArchiveTools

A collection of tools for archiving and analysing the internet.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unknown mime type - Questions

nihelmasell opened this issue · comments

Hi, I get the following message when extracting a dump:

Count of unknown mime type.
{'': 17,
'application/binary': 7,
'application/font-woff2': 6,
'application/x-font-ttf': 6,
'application/x-font-woff': 1,
'application/x-javascript': 3458,
'binary/octet-stream': 17,
'font/truetype': 7,
'font/ttf': 1,
'font/woff': 2,
'font/woff2': 156,
'font/x-woff': 7,
'image/x-icon': 9,
'text/javascript': 34}

  1. Can I be sure all files inside the webcapture are being extracted? (like when you extract a zip file, for example)
  2. This is the only script I have found that can perform full dumps. Do you know any other else? (my only worry, related to the last question, is that not all files were extracted)

Best Regards, forgive my ignorance

It's been a while since I last looked at this project. But it should extract all of the files, the issue with file mime type is if the program can't infer it correctly there is a possibility the file might not extract correctly and can end up unreadable. I am not aware of any other programs that can dump files, which is why I wrote this program in the first place. However, the project is getting a bit old at this point so there could be other projects I'm not aware of (or even better forks of this one).