mirko-leccese / Clean-Azure-Copy-Activity-Mapping

Python script that cleans the JSON document used by an Azure Data Factory or Azure Synapse Copy Activity to specify the source-sink mapping.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clean-Azure-Copy-Activity-Mapping

Azure Data Factory and Azure Synapse Pipelines Copy Activity fails whenever you are trying to copy tables into Azure Data Lake Storage Gen2 in Parquet file format and your column headers contain characters such as whitespaces, brackets, etc. To avoid such error, you need to remove these undesired characters on the sink side when specifying the schema mapping. In case of tables with several columns, this operation may be tedious.

This simple Python script reads a JSON file specifying the schema mapping and cleans column headers from undesired characters. The typical structure of such a JSON file is:

{
    "translator": {
        "type": "TabularTranslator",
        "mappings": [
            {
                "source": "customer id",
                "sink": "customer id"
            },
            {
                "source": "orders",
                "sink": "orders"
            }
        ]
    }
}

The script produces a new JSON called filename_Cleaned.json where any sinkfield from the mappingsarray has been cleaned.

The present version of the script targets the following characters:

characterToCheck = [",", ";", "{", "}", "(", ")", "\n", "\t", "=", " "]

and replace them with a -.

Installation

No installation required. To run the script, type:

./clean_azure_copy_mapping.py 

and answer the question asking for the JSON filename.

About

Python script that cleans the JSON document used by an Azure Data Factory or Azure Synapse Copy Activity to specify the source-sink mapping.


Languages

Language:Python 100.0%