ClimateImpactLab / metacsv

Tools for documentation-aware data reading, writing, and analysis

Home Page:https://metacsv.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide a function for just parsing a header from a file pointer

jrising opened this issue · comments

We have text files with metacsv headers, but the content of the file doesn't correspond to anything pandas likes. There should be some way to read these headers, leaving the file pointer at end of the header. The syntax might be like this:

with open(filename, 'r') as fp:
    meta = metacsv.deparse(fp)
    # Do other stuff with fp

Or, it might be nice to fill in a passed-in dictionary, so that the syntax could be:

meta = {} # put defaults in here
with metacsv.deparse(open(filename, 'r'), meta) as fp:
    # Do other stuff with fp

Just created PR #2. This creates a new io function read_header which acts much like read_csv but returns a tuple containing (Attributes, Variables, Coordinates) objects.

The first functionality you mentioned works:

with open(filename, 'r') as fp:
    attrs, variables, coords = metacsv.read_header(fp)
    # do other stuff with fp

See help(metacsv.read_header) for more documentation.

Currently, all metadata in metaCSV are stored in these three objects, and there is no single metadata object. Changing this would require restructuring the module, though I see how this might be useful. One could imagine a lightweight metadata object which hosts the attr, variable, and coords objects. The current API could be preserved by changing all references to these attributes to point to the corresponding attribute on the metadata object. But this would involve a decent amount of work. I think I'll wait to see if there's more of a need for this.

As soon as the tests are passed I'll merge it in and push to pypi so you can use it. Let me know if you think this is an acceptable solution.