scraperwiki / messytables

NO LONGER USED - use the official version at https://github.com/okfn/messytables

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mime type detection fails with "Can't read SAT" Edit

frabcus opened this issue · comments

This spreadsheet:
http://www.fao.org/fileadmin/templates/worldfood/Reports_and_docs/Food_price_indices_data_deflated.xls

If I read it with any:

import messytables

filename = "Food_price_indices_data_deflated.xls"
tableset = messytables.any.any_tableset(open(filename), extension=".xls")
print tableset

I get this mime error:

$ ./messy_to_json.py 
Traceback (most recent call last):
  File "./messy_to_json.py", line 7, in <module>
    tableset = messytables.any.any_tableset(open(filename), extension=".xls")
  File "/usr/local/lib/python2.7/dist-packages/messytables/any.py", line 48, in any_tableset
    raise ValueError("Unrecognized MIME type: " + mimetype)
ValueError: Unrecognized MIME type: Composite Document File V2 Document, corrupt: Can't read SAT

It works fine if I do:

tableset = messytables.any.any_tableset(open(filename), mimetype="application/vnd.ms-excel", extension=".xls")

You need >64K of the file to get the right mime type. This is horrifically juryrigged in master by detecting the error message. Fixed; may not be stable across versions of magic (Ross reports application/somethingorother-corrupt)