livolleyball / BigQueryToTableauExtractor

This solution is a 3-step script based approach to refresh Tableau Hyper Extracts off Google BigQuery datasets at incredible speeds. Additionally it supports Google Service accounts

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BigQuery To Tableau Extractor

all.py : Optimize all aspects in a script.

  • user Python package google-cloud-storage to replace gcutil.
  • wait for the job status after task is commit
  • use config file to replace terminal param

This is a 3-step script solution to extract data from Google Big Query at "lightining" speeds :)

It basically works in three steps:

1 - File: 1_export_data.py

Using Google Bigquery API (Python library), export data from BigQuery and store it into a Google Cloud Storage Bucket (e.g. as multiple CSV files).

2- File: 2_download_bucket_files.py

Using the Google Cloud Storage gsutil tool , download all files from a bucket, using multi-threading, so we can parallelize the download task to maximize network utilization.

Make sure you have a blazing fast SSD drive attached to your machine, as well as very good bandwidth (e.g. 500 MBPS or more), as these two factors will greately impact the performance of this overall Extract solution. Afterall, we are talking about downloading tens of GBs of datas over the web and storing locally. I/O and Network latency and trhoughput are serious bottlenecks.

3- File: RefreshExtractByName.py

Now that all CSVs are local and visible to Tableau Server, we use Tableau's REST API, more specifically, the Tableau Server Client - Python, to kick off a extract refresh off these files.


We've also included a 4th python file (RunThis_1_2_3_in_one.py), which basically wraps the three scripts above into one single python file. You should run this one under a scheduler tool (e.g. Cron jobs, Windows scheduler), but at the moment, the configurations are spread into each files (I know, this is not a best practices course at all ;) )

The full instructions how to use this code here: https://community.tableau.com/docs/DOC-23161

About

This solution is a 3-step script based approach to refresh Tableau Hyper Extracts off Google BigQuery datasets at incredible speeds. Additionally it supports Google Service accounts

License:MIT License


Languages

Language:Python 100.0%