intake / intake

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Home Page:https://intake.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it possible to specify a Python function as `args` in a YAMLFileCatalog?

dougiesquire opened this issue · comments

I'd like to create a YAMLFileCatalog that includes a source with an argument that is a Python function. Is there a way to do this?

An example using a CSV source:

import ast
import intake
import pandas

# Create the source data
data = {
    "col0": [["a","b"], ["c","d","e"]], 
    "col1": [0, 1]
}
pandas.DataFrame(data).to_csv("test.csv", index=False)

# Open using intake with converter function
cat = intake.open_csv("test.csv", csv_kwargs={"converters": {"col0": ast.literal_eval}})
cat.read() # works

# Can we write a YAML catalog with the converter function specified in args?
with open("test.yaml", "w") as f:
    f.write(cat.yaml())
    
# It seems not (this fails)
intake.open_catalog("test.yaml")

The contents of test.yaml look like this:

sources:
  csv:
    args:
      csv_kwargs:
        converters:
          col0: !!python/name:ast.literal_eval ''
      urlpath: test.csv
    description: ''
    driver: intake.source.csv.CSVSource
    metadata: {}

YAML does allow for specifying python objects like that, but Intake explicitly "safe loads" YAML, to prevent arbitrary code execution from reading a catalog file. In a few places, it is possible to specify functions as a string like "package.module.funcname" (or maybe a ":" instead of the final "."; see https://intake.readthedocs.io/en/latest/transforms.html#functional-example ), but these are NOT evaluated by loading alone, but by the specifics of the driver in question. The CSV driver does not know about any func-type arguments.