JDBraun / isolake

Isolake is a simple and specialized Databricks workspace deployment design on AWS that isolates users and workloads from the public internet, utilizing Unity Catalog and AWS PrivateLink as its foundational architectural components

Home Page:https://medium.com/databricks-platform-sme/isolake-a-simplistic-deployment-design-to-an-isolated-databricks-lakehouse-on-aws-c0f98b5bbba0

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview

  • Isolake is an isolated Databricks environment with scoped-down policies, no non-Databricks or non-AWS outbound connectivity, enterprise security Databricks features (e.g. PrivateLink, CMK, audit logs etc.),and optionally only accessible through an AWS AppStream instance.

Disclaimer

This Terraform code is provided as a sample for reference and testing purposes only. Please review and modify the code according to your needs before using it in your test environment.

NOTE: The practice of passing credentials through Terraform input variables is intended solely for rapid testing purposes.

  • It should NOT be implemented in production ennvironment or any other higher-level environments.
  • For enhanced security, it is recommended to use more secure methods like AWS Secret Manager or environment-specific secrets for storing credentials.

Architecture

Isolake Architecture


Version Requirements

Name Version
databricks 1.27.0

Inputs - Main "Isolake" Module

Name Description Type Default Required
aws_account_id AWS account ID for integration string n/a yes
client_id Databricks client ID for OAuth string n/a yes
client_secret Databricks client secret for OAuth string n/a yes
databricks_account_id Databricks account ID string n/a yes
region AWS region for resource deployment string n/a yes
resource_owner Owner of the resources for tracking and management string n/a yes
resource_prefix Prefix for naming created resources string n/a yes

Sub Modules

Name Source Version
audit ./audit n/a
databricks_cluster ./cluster n/a
databricks_mws_workspace ./workspace n/a
lockdown_access ./lockdown/access n/a
lockdown_data_bucket ./lockdown/data_bucket n/a
lockdown_dbfs ./lockdown/dbfs n/a
lockdown_nacls ./lockdown/nacls n/a
uc_data ./uc_data n/a
uc_init ./uc_init n/a
uc_assignment ./uc_assignment n/a
vpc terraform-aws-modules/vpc/aws 5.1.1
vpc_endpoints terraform-aws-modules/vpc/aws//modules/vpc-endpoints 3.11.0

Inputs - Sub Modules

Name Description Type Default Required
aws_account_id AWS account ID for integration string n/a yes
client_id Databricks client ID for OAuth string n/a yes
client_secret Databricks client secret for OAuth string n/a yes
control_plane_ip IP for Databricks control plane string n/a yes
data_access Name of the user or entity that will be given read access to the data in UC string n/a yes
data_bucket Name of the existing data bucket string n/a yes
databricks_account_id Databricks account ID string n/a yes
dbfsname S3 bucket name for the workspace root storage string n/a yes
enable_cluster_example Flag to enable example cluster with Derby Metastore bool n/a yes
enable_dbfs_lockdown Lockdown on workspace root bucket bool n/a yes
enable_front_end_lockdown Flag to enable frontend lockdown bool n/a yes
enable_nacl_lockdown Lockdown on private subnet NACLs bool n/a yes
enable_read_only_data_bucket_lockdown Read-only lockdown on data bucket bool n/a yes
full_region_name Full name of the region for restrictive DBFS bucket policies string n/a yes
private_subnets_cidr CIDR blocks for private subnets list(string) n/a yes
region AWS region for resource deployment string n/a yes
region_name Short name of the region string n/a yes
relay_vpce_service VPCE service for relay string n/a yes
resource_owner Owner of the resources for tracking and management string n/a yes
resource_prefix Prefix for naming created resources string n/a yes
restricted_uc_bucket_policy Restrictive policy on Unity Catalog bucket bool n/a yes
sg_egress_protocol Allowed protocols for within security group egress list(string) n/a yes
sg_ingress_protocol Allowed protocols for within security group ingress list(string) n/a yes
system_arn System ARN for bucket policies string n/a yes
system_ip System IP for administrative access and bucket policies string n/a yes
ucname S3 bucket name for the Unity Catalog (UC) metastore string n/a yes
vpc_cidr_range CIDR range for the VPC string n/a yes
workspace_vpce_service VPCE service for workspace string n/a yes
ws_ld_availability_zones Availability zone for AppStream list(string) n/a yes
ws_ld_private_subnets_cidr CIDR for AppStream private subnets list(string) n/a yes
ws_ld_vpc_cidr_range CIDR range for AppStream VPC string n/a yes

Bug Fixes or Feature Requests

  • Please raise a GitHub Issue with bug fixes or proposed feature requests for the Isolake deployment design.

About

Isolake is a simple and specialized Databricks workspace deployment design on AWS that isolates users and workloads from the public internet, utilizing Unity Catalog and AWS PrivateLink as its foundational architectural components

https://medium.com/databricks-platform-sme/isolake-a-simplistic-deployment-design-to-an-isolated-databricks-lakehouse-on-aws-c0f98b5bbba0

License:Apache License 2.0


Languages

Language:HCL 100.0%