modusnl / GLEIF

This project demonstrates various processing approaches for handling large Xml files, using the publicly available GLEIF datasets.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GLEIF (multi GB Xml-handling)

This project demonstrates various processing approaches for handling large Xml files, using the publicly available LEI2 dataset.

Architecture

FunctionApp

Projects

See Data for scripts used for downloading the Data from the public API endpoint, and uploading the files to the Data Lake Store.

See ConsoleApp for the .NET Core CommandLineApp as a starting point for deriving some smaller files off the big Xml and lots of valuable .NET methods for Reading, Writing, Validating, Serializing Xml.

See FunctionApp for the Functions which are hosting the .NET Core snippets and which are (mostly) triggered by new Blob events

See Database for the T-SQL approaches

See Databricks for the Spark approaches

See DataLake for the U-SQL approaches -> outdated approach, use FunctionApp & Databricks instead

GLEIF

The Global Legal Entity Identifier Foundation (GLEIF) is tasked to support the implementation and use of the Legal Entity Identifier (LEI). The Legal Entity Identifier (LEI) enables clear and unique identification of legal entities engaging in financial transactions.

LEI data is a good open data source for demonstrating multi GB Xml-handling, while working with a valuable dataset. That is, because we believe in working software over comprehensive documentation

Data

Run Download-LEI2.ps1 for downloading the 155 mb Zip file and extracting the 2.6 GB Xml file

About the LEI data format: LEI Level 1 data CDF v2.1

About downloading the contatenated files: gleif.org/gleif-concatenated-file

About

This project demonstrates various processing approaches for handling large Xml files, using the publicly available GLEIF datasets.

License:MIT License


Languages

Language:C# 86.8%Language:PowerShell 6.5%Language:TSQL 6.3%Language:XSLT 0.3%Language:Python 0.1%