zillow / intake-nested-yaml-catalog

Supports a single YAML file hierarchical catalog to organize datasets and avoid a data swamp.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

https://travis-ci.org/zillow/intake-nested-yaml-catalog.svg?branch=master https://coveralls.io/repos/github/zillow/intake-nested-yaml-catalog/badge.svg?branch=master Documentation Status

Welcome to Intake plugin for nested YAML catalogs

This is an Intake plugin supporting a single YAML hierarchical catalog to organize datasets and avoid a data swamp.

Example of organizing the datasets by business domain entities:

metadata:
  hierarchical_catalog: true
entity:
  customer:
    customer_attributes:
      args:
        urlpath: s3://foo
      driver: parquet
  user:
    user_profile:
      args:
        urlpath: s3://foo
      driver: parquet

Can be accessed as:

df = catalog.entity.customer.customer_attributes.read()

About

Supports a single YAML file hierarchical catalog to organize datasets and avoid a data swamp.

License:Apache License 2.0


Languages

Language:Python 100.0%