custom dbt adapter for Azure Synapse. Major credit due to @mikaelene and his sqlserver
custom adapter.
- macros use only Azure Synapse
T-SQL
. Relevant GitHub issue - use of Create Table as Select (CTAS) means you don't need post-hooks to create indices
- Azure Active Directory Authentication options
- rewrite of snapshots because Synapse doesn't support
MERGE
. - external table creation via details from yaml.
- must first create
EXTERNAL DATA SOURCE
andEXTERNAL FILE FORMAT
s.
- must first create
as of now, only support for dbt 0.18.0
Passing all tests in dbt-adapter-tests, except test_dbt_ephemeral_data_tests
ephemeral
materializations (workaround for non-recursive CTEs)- auto-create
EXTERNAL DATA SOURCE
andEXTERNAL FILE FORMAT
s. - officially rename the adapter from
sqlserver
tosynapse
- Use CTAS to create seeds?
- Add support for
ActiveDirectoryMsi
Easiest install is to use pip (not yet registered on PyPI).
First install ODBC Driver version 17.
pip install dbt-synapse
On Ubuntu make sure you have the ODBC header files before installing
sudo apt install unixodbc-dev
The following is needed for every target definition for both SQL Server and Azure SQL. The sections below details how to connect to SQL Server and Azure SQL specifically.
type: synapse
driver: 'ODBC Driver 17 for SQL Server' (The ODBC Driver installed on your system)
server: server-host-name or ip
port: 1433
schema: schemaname
Encryption is not enabled by default, unless you specify it.
To enable encryption, add the following to your target definition. This is the default encryption strategy recommended by MSFT. For more information see this docs page
encrypt: true # adds "Encrypt=Yes" to connection string
trust_cert: false
For a fully-secure, encrypted connection, you must enable trust_cert: false
because "TrustServerCertificate=Yes"
is default for dbt-sqlserver
in order to not break already defined targets.
SQL Server credentials are supported for on-prem as well as cloud, and it is the default authentication method for dbt-sqlsever
user: username
password: password
The following pyodbc
-supported ActiveDirectory methods are available to authenticate to Azure SQL:
- Azure CLI
- ActiveDirectory Password
- ActiveDirectory Interactive
- ActiveDirectory Integrated
- Service Principal (a.k.a. AAD Application)
ActiveDirectory MSI(not implemented)
However, the Azure CLI is the ideal way to authenticate instead of using the built-in ODBC ActiveDirectory methods, for reasons detailed below.
Use the authentication of the Azure command line interface (CLI). First, install the Azure CLI, then, log in:
az login
Then, set authentication
in profiles.yml
to CLI
:
authentication: CLI
This is also the preferred route for using a service principal:
az login --service-principal --username $CLIENTID --password $SECRET --tenant $TENANTID
This avoids storing a secret as plain text in profiles.yml
.
Definitely not ideal, but available
authentication: ActiveDirectoryPassword
user: bill.gates@microsoft.com
password: i<3opensource?
brings up the Azure AD prompt so you can MFA if need be. The downside to this approach is that you must log in each time you run a dbt command!
authentication: ActiveDirectoryInteractive
user: bill.gates@microsoft.com
uses your machine's credentials (might be disabled by your AAD admins), also requires that you have Active Directory Federation Services (ADFS) installed and running, which is only the case if you have an on-prem Active Directory linked to your Azure AD...
authentication: ActiveDirectoryIntegrated
client_*
and app_*
can be used interchangeably. Again, it is not recommended to store a service principal secret in plain text in your dbt_profile.yml
. The CLI auth method is preferred.
authentication: ServicePrincipal
tenant_id: tenatid
client_id: clientid
client_secret: clientsecret
CTAS allows you to materialize tables with indices and distributions at creation time, which obviates the need for post-hooks to set indices.
You can also configure index
and dist
in dbt_project.yml
.
{{
config(
index='HEAP',
dist='ROUND_ROBIN'
)
}}
select *
from ...
is turned into the relative form (minus __dbt
's _backup
and _tmp
tables)
CREATE TABLE ajs_stg.absence_hours
WITH(
DISTRIBUTION = ROUND_ROBIN,
HEAP
)
AS (SELECT * FROM ajs_stg.absence_hours__dbt_tmp_temp_view)
CLUSTERED COLUMNSTORE INDEX
(default)HEAP
CLUSTERED INDEX ({COLUMN})
ROUND_ROBIN
(default)HASH({COLUMN})
REPLICATE
sources:
- name: raw
schema: source
loader: ADLSblob
tables:
- name: absence_hours
description: |
from raw DW.
external:
data_source: SynapseContainer
location: /absence_hours_live/
file_format: CommaDelimited
reject_type: VALUE
reject_value: 0
columns:
Adds support for:
- SQL Server down to version 2012
- authentication via:
- Azure CLI (see #71, thanks @JCZuurmond !), and
- MSFT ODBC Active Directory options (#53 #55 #58 thanks to @NandanHegde15 and @alieus)
- using a named instance (#51 thanks @alangsbo)
- Adds support down to SQL Server 2012
- The adapter is now automatically tested with Fishtowns official adapter-tests to increase stability when making changes and upgrades to the adapter.
- Fix for lack of precision in the snapshot check strategy. Previously when executing two check snapshots the same second, there was inconsistent data as a result. This was mostly noted when running the automatic adapter tests. NOTE: This fix will create a new snapshot version in the target table on first run after upgrade.
- Adds support for Azure Active Directory as authentication provider
- Fix for lack of precision in the snapshot check strategy. (#74 and #56 thanks @qed) Previously when executing two check snapshots the same second, there was inconsistent data as a result. This was mostly noted when running the automatic adapter tests. NOTE: This fix will create a new snapshot version in the target table on first run after upgrade.
- #52 Fix deprecation warning (Thanks @jnoynaert)
- The adapter is now automatically tested with Fishtowns official adapter-tests to increase stability when making changes and upgrades to the adapter. (#62 #64 #69 #74)
- We are also now testing specific target configs to make the devs more confident that everything is in working order (#75)
-
Adds support for dbt v0.18.0
-
Add CI testing (#19)
-
Remove the external table macros in favor of pulling them directly from
dbt-external-tables
-
Bundle the "
INSERT
&UPDATE
"MERGE
workaround into a transaction that can be rolled back (#23) -
Handle nulls in csv file for seeds (#20)
-
Verifed that adapter works with
dbt
versionv0.18.1
- pull AD auth directly from
dbt-sqlserver
(microsoft#13) - hotfix for broken
create_view()
macro (microsoft#14) - get
dbt-adapter-tests
up and running (microsoft#16)- make
sqlserver__drop_schema()
also drop all tables and views associated with schema - introduce
sqlserver__get_columns_in_query()
for use with testing - align macro args with
dbt-base
- make
- added snapshot functionality
- initial release