google / weather-tools

Tools to make weather data accessible and useful.

Home Page:https://weather-tools.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Config values need to be typed on parsing.

alxmrs opened this issue · comments

The current config-value parser treats all values as strings. Instead, we need to try to infer types in a best-effort way. This is a pre-requisite step in order to fix #4.

Perhaps consider using protobuf for this? It is easy enough to write textproto files and then you get all the useful parsing that protobuf contains by default.

To be more specific: consider using a textproto for this. Protobuf can be binary or text, but for configuration, text is much better and can be easily edited with a text editor.

I have a few hesitations with using textprotos. Though, I'm open to using them if I can address my concerns.

  1. I'd like to support multiple file types for serialization. Right now we support JSON and ConfigParser. I could see textprotos fitting in as another format we support, but I'd prefer not to replace all formats with textprotos. Thus, I think there will be a need for type checking. I don't want to deprecate the old formats for configuring downloads if I don't have to.

    def parse_config(file: t.IO) -> Config:

  2. To my knowledge, protobufs don't support datetime values natively, where this is an important datatype for our parser to support. They do support Timestamps and int64s, but I would argue that entering ranges of dates in this format would provide a worse user experience than writing ISO-like date strings.

  3. There is some sugar I would like to provide no matter the format the data is in. For example:

    def parse_mars_syntax(block: str) -> t.List[str]:

    We could just use protos for this, but it would require a lot of repetition on the textproto writers side. This could be automated with a script / code, but then we might as well put this logic as syntax in the format. Indeed, ranges of values come up a lot on our config file, so this sugar / syntax is really important to keep for the user experience.

Maybe there's a way to support both protos and this range syntax? I have an outstanding bug here to do this for the JSON implementation:

# TODO(b/175429166): JSON files do not support MARs range syntax.
In any case, I do think that if we adopted protos, then we'd still want to add some rudimentary type paring capabilities.

As an implementation note: I think that type checking ought to be in a separate method and be called right after parsing the underlying format type, about here:

def process_config(file: t.IO) -> Config:

config = parse_config(file)

A simple way of replacing the string representation of values with literals is to make use of this method from ast:

https://docs.python.org/3/library/ast.html#ast.literal_eval

The motivation for typing data is mentioned in #4. It's really only necessary for creating the name of the output file; in fact, each downloader client is totally fine accepting metadata as strings and downloading them (they'll accept typed values, too, but don't need to). In reviewing this issue and it's siblings, it occurs to me that another approach for achieving #4 would be to type values "just-in-time" for string formatting. By this, I mean that we could have a method that handles MARS types (as defined in #118 (comment)) from strings, and we invoke that method here:

def prepare_target_name(config: Config) -> str:

Fixed in #144.