terraform-google-bigquery
Terraform Module to create Google Cloud Platform BigQuery datasets and tables. This will allow the user to programmatically create an empty table schema inside of a dataset, ready for loading. Additional user accounts and permissions are necessary to begin querying the newly created table(s).
Resources
- Big Query Dataset
- Big Query Table
- Big Query Dataset IAM
- Big Query Table IAM
- Big Query Job Load
- Google Project IAM
- Google IAM Predefined Roles
Example Usage
// cloudbuild api is required
resource "google_project_service" "bigquery_service" {
project = local.project_id
service = "bigquery.googleapis.com"
}
module "bigquery" {
source = "app.terraform.io/Seagen/bigquery/google"
version = "5.2.0"
dataset_id = google_bigquery_dataset.dataset.dataset_id
dataset_name = "nyt-covid-dataset"
description = google_bigquery_dataset.dataset.description
project_id = local.project_id
location = google_bigquery_dataset.dataset.location
tables = [
{
table_id = google_bigquery_table.table.table_id,
schema = file("bigquery/nyt_covid_count_by_state_schema.json"),
time_partitioning = null,
range_partitioning = null,
expiration_time = null,
clustering = [],
labels = local.labels,
}
]
dataset_labels = local.labels
}
//Creating a Big Query Dataset Resource
resource "google_bigquery_dataset" "dataset" {
dataset_id = "nyt_covid_dataset"
description = "New York Times Covid Dataset"
location = "US"
}
//Creating a Big Query Table Resource
resource "google_bigquery_table" "table" {
deletion_protection = false
dataset_id = google_bigquery_dataset.dataset.dataset_id
table_id = "nyt_covid_count_by_state"
}
Depending on the schema of your example you will need to create a folder within your repo called 'bigquery'. Inside this folder add the example_table_schema.json file such as the one below:
[
{
"description": "Date",
"mode": "NULLABLE",
"name": "date",
"type": "DATE"
},
{
"description": "Name of State",
"mode": "NULLABLE",
"name": "state_name",
"type": "STRING"
},
{
"description": "State Identifier",
"mode": "NULLABLE",
"name": "state_fips_code",
"type": "INTEGER"
},
{
"description": "Confirmed Number of Cases",
"mode": "NULLABLE",
"name": "confirmed_cases",
"type": "INTEGER"
},
{
"description": "Number of Deaths",
"mode": "NULLABLE",
"name": "deaths",
"type": "INTEGER"
}
]
Features
This module provisions a dataset and a list of tables with associated JSON schemas and views from queries.
Inputs
Name | Description | Type | Default | Required |
---|---|---|---|---|
access | An array of objects that define dataset access for one or more entities. | any |
[ |
no |
dataset_id | Unique ID for the dataset being provisioned. | string |
n/a | yes |
dataset_labels | Key value pairs in a map for dataset labels | map(string) |
{} |
no |
dataset_name | Friendly name for the dataset being provisioned. | string |
null |
no |
default_table_expiration_ms | TTL of tables using the dataset in MS | number |
null |
no |
delete_contents_on_destroy | (Optional) If set to true, delete all the tables in the dataset when destroying the resource; otherwise, destroying the resource will fail if tables are present. | bool |
null |
no |
deletion_protection | Whether or not to allow Terraform to destroy the instance. Unless this field is set to false in Terraform state, a terraform destroy or terraform apply that would delete the instance will fail | bool |
false |
no |
description | Dataset description. | string |
null |
no |
encryption_key | Default encryption key to apply to the dataset. Defaults to null (Google-managed). | string |
null |
no |
external_tables | A list of objects which include table_id, expiration_time, external_data_configuration, and labels. | list(object({ |
[] |
no |
location | The regional location for the dataset only US and EU are allowed in module | string |
"US" |
no |
project_id | Project where the dataset and table are created | string |
n/a | yes |
routines | A list of objects which include routine_id, routine_type, routine_language, definition_body, return_type, routine_description and arguments. | list(object({ |
[] |
no |
tables | A list of objects which include table_id, schema, clustering, time_partitioning, range_partitioning, expiration_time and labels. | list(object({ |
[] |
no |
views | A list of objects which include table_id, which is view id, and view query | list(object({ |
[] |
no |
Outputs
Name | Description |
---|---|
bigquery_dataset | Bigquery dataset resource. |
bigquery_external_tables | Map of BigQuery external table resources being provisioned. |
bigquery_tables | Map of bigquery table resources being provisioned. |
bigquery_views | Map of bigquery view resources being provisioned. |
external_table_ids | Unique IDs for any external tables being provisioned |
external_table_names | Friendly names for any external tables being provisioned |
project | Project where the dataset and tables are created |
routine_ids | Unique IDs for any routine being provisioned |
table_ids | Unique id for the table being provisioned |
table_names | Friendly name for the table being provisioned |
view_ids | Unique id for the view being provisioned |
view_names | friendlyname for the view being provisioned |