bmsuisse / lakeapi2sql

Read from the lake api and write to MS SQL Server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LakeApi 2 SQL

This is a simple library that has currently there methods:

  • insert_http_arrow_stream_to_sql

    Make a HTTP Request to an Endpoint from the Lake API and inserts the data via bulk insert into MS SQL Server. In Theory you could also get the data from some other HTTP Endpoint which returns an Arrow Stream and is authenticated using Basic Auth. It does not guarantee atomicity at sql server level, therefore you will usually want to use a global temp table as target.

  • insert_record_batch_to_sql

    Same as above, but the input is a generic RecordBatchReader from pyarrow

It's meant to be used from Python, the Logic is written in Rust.

Features

  • You can specify Authentication=ActiveDirectoryMSI|ActiveDirectoryDefault|ActiveDirectoryInteractive in the connection string similar to .Net/ODBC SQL Driver. This requires the azure-identity package to be installed

Roadmap

There is still a lot todo:

  • Allow passing more flexible HTTP Authentication options
  • Add option to read from database and write to a flat file
  • Document
  • Test

Alternatives

This would not have been possible without the excellent arrow-odbc-py library. Use it whenever SQL Server is not the only possible target, or you need to read from a database or you just need something better ;)

About

Read from the lake api and write to MS SQL Server

License:MIT License


Languages

Language:Rust 74.6%Language:Python 17.3%Language:TSQL 8.1%