victorcouste / google-data-catalog-dataprep

Create or update Google Cloud Data Catalog tags with Cloud Dataprep metadata and column profile

Home Page:https://cloud.google.com/dataprep

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Google Data Catalog and Cloud Dataprep Tags

Create or update Google Cloud Data Catalog tags on BigQuery tables with Cloud Dataprep Metadata and Column's Profile via a Cloud Function.

The 2 Data Catalog tags created or updated:

  • Dataprep Job Metadata tag attached to the BigQuery table and containing information from the Dataprep job used to create or update the BigQuery table : the user, Dataprep Job (id, name, url, timestamp), Dataprep Dataset (id, name, url), Dataprep Flow (id, name, url), Job Profile (url and nb valid, invalid an empty values) and the Dataflow job (id, url).
  • Dataprep Job Column's Profile tag attached to all BigQuery table columns and containing number of valid, invalid and empty values for each column.

To activate, learn and use Cloud Data Catalog, go to https://cloud.google.com/data-catalog and https://console.cloud.google.com/datacatalog.

This repository contains the Cloud Function Python code triggered from a Dataprep Webhook to create or update 2 Data Catalog tags.

This Cloud Function uses:

In your Cloud Function, you need the 5 files:

Before running the Cloud Function (and create or update tags), you need to create the 2 Data Catalog Tag Templates for Dataprep (Job Metadata and Job Column Profile). You can use:

To use the Cloud Function you just have to pass the Dataprep Job ID in a JSON format like {"job_id":"7827359"}.

And to trigger it from a Cloud Dataprep flow, you can use a Webhook on the Cloud Function endpoint with {"job_id":"$jobId"} in the POST body.

When Data Catalog template tags are created and when tags are created or updated on BigQuery tables, you can find all results from https://console.cloud.google.com/datacatalog.

Finally, you can also search BigQuery tables in Cloud Data Catalog with a Dataprep tag from your own application like https://github.com/victorcouste/dataprep-datacatalog-explorer


Happy wrangling and tagging !


image

image

image

image

image

About

Create or update Google Cloud Data Catalog tags with Cloud Dataprep metadata and column profile

https://cloud.google.com/dataprep


Languages

Language:Python 86.8%Language:Shell 13.2%