README
Google Analytics 360 Flattener. A Google Cloud Platform (GCP) solution that unnests (flattens) Google Analytics Data stored in Bigquery. The GCP resources for the solutions are installed via Deployment Manager.
Local dependencies
- Python 3.7 or higher as base interpreter
- Create a virtual environment
- Install python packages using cf/requirements.txt
Directories
- cf : pub/sub triggered cloud function that executes a destination
query to unnest(flatten) the .ga_sessions_yyyymmdd table
immediately upon arrival in BigQuery into 5 tables:
- ga_flat_sessions_yyyymmdd
- ga_flat_hits_yyyymmdd
- ga_flat_products_yyyymmdd
- ga_flat_experiments_yyyymmdd
- ga_flat_promotions_yyyymmdd
- tests : units test for both cloud functions and deployment manager templates
- cfconfigbuilder(ps) : cloud function that finds all BigQuery datasets that have a ga_sessions table and adds them to the default configuration on Google's Cloud Storage in the following location: [DEPLOYMENT NAME]-[PROJECT_NUMBER]-adswerve-ga-flat-config\config_datasets.json
Files
- dm_helper.py: provides consistent names for GCP resources accross solution. Configuration and constants also found in the class in this file
- dmt-*: any files prefixed with dmt_ are python based Deployment Manager templates
- ga_flattener.yaml: Deployment Manager configuration file. The entire solution packages in this file. Used in the deployment manager create command
- tools/pubsub_message_publish.py : python based utility to publish a message to simulate an event that's being monitored in GCP logging. Useful for smoke testing and back-filling data historically.
- LICENSE: BSD 3-Clause open source license
Prerequisites
- Create Google GCP project or use an existing project that has Google Analytics data flowing to it. Referred to as [PROJECT_ID]
- Enable the Cloud Build API
- Enable the Cloud Functions API
- Enable the Identity and Access Management (IAM) API
- Add "Logs Configuration Writer", "Cloud Functions Developer", "pub/sub Admin" pre defined IAM roles to [PROJECT_NUMBER]@cloudservices.gserviceaccount.com (built in service account) otherwise deployment will fail with permission errors. See https://cloud.google.com/deployment-manager/docs/access-control for detailed explanation.
- Install gCloud locally or use cloud shell.
- Clone this github repo
- Create bucket for staging code during deployment, for example: [PROJECT_NUMBER]-function-code-staging. Referred to as [BUCKET_NAME].
- Edit the ga_flattener.yaml file, specifically the properties-->codeBucket value of the function and httpfunction resources. Set the value for both to [BUCKET_NAME] (see previous step)
Installation steps
- Execute command: gcloud config set project [PROJECT_ID]
- Execute command: gcloud config set account username@domain.com
- Navigate (locally) to root directory of this repository
- If [PROJECT_ID] does NOT contain a colon (:) execute command:
- gcloud deployment-manager deployments create [Deployment Name] --config ga_flattener.yaml
otherwise follow these steps:
- execute command:
- gcloud deployment-manager deployments create [Deployment Name] --config ga_flattener_colon.yaml
- Trigger function (with a blank message) named [Deployment Name]-cfconfigbuilderps. It will create the necessary configuration file in the applications Google Coud Storage bucket.
- gcloud deployment-manager deployments create [Deployment Name] --config ga_flattener.yaml
otherwise follow these steps:
Verification steps
- After installation, a configuration file named config_datasets.json will exists in gs://[Deployment Name]-[PROJECT_NUMBER]-adswerve-ga-flat-config/ (Cloud Storage Bucket within [PROJECT_ID]). This file will contains all the datasets that have "ga_sessions_yyyymmdd" tables and which tables to unnest. This configuration is required for the cloud function to execute.
Testing / Simulating Event
- Modify values in lines 7-17 of tools/pubsub_message_publish.py accordingly.
- Run tools/pubsub_message_publish.py locally, which will publish a
simulated logging event of GA data being ingested into BigQuery. Check dataset for date sharded tables named:
- ga_flat_experiments_(x)
- ga_flat_hits_(x)
- ga_flat_products_(x)
- ga_flat_promotions_(x)
- ga_flat_sessions_(x)
Un-install steps
- Optional command to remove solution:
- gcloud deployment-manager deployments delete [Deployment Name] -q
Common install errors
-
- Message: Step #2: AccessDeniedException: 403 [PROJECT_NUMBER]@cloudbuild.gserviceaccount.com does not have storage.objects.list access to the Google Cloud Storage bucket.
- Resolution: Ensure the value (Cloud Storage bucket name) configured in "codeBucket" setting of ga_flattener*.yaml is correct. [PROJECT_NUMBER]@cloudbuild.gserviceaccount.com only requires GCP predefined role of Cloud Build Service Account