cwrw / dfe-analytics

Emit standard analytics events from your Rails application

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DfE::Analytics

👉 Send every web request and model update to BigQuery

✋ Skip or anonymise fields containing PII

✌️ Configure and forget

Overview

This gem provides an opionated integration with Google BigQuery.

Once it is set up, every web request and database update (as permitted by configuration) will flow to BigQuery.

It also provides a Rake task for backfilling BigQuery with models created before you started sending events (see Importing existing data below), and one for keeping your field configuration up to date.

To set the gem up follow the steps in "Configuration", below.

Names and jargon

A Rails model is an analytics Entity.

A change to a model (including creation and deletion) is an analytics Event. When a model changes we send the entire new state of the model as part of the event.

A web request is also an analytics Event.

Architecture

sequenceDiagram
    participant Client
    participant Analytics middleware
    participant Controller
    participant Model
    participant RequestStore
    Client->>+Controller: GET /index
    activate Controller
    Analytics middleware-->>RequestStore: Store request UUID
    Controller->>Model: Update model
    Model->>Analytics: after_update hook
    Analytics-->>RequestStore: Retrieve request UUID
    Analytics->>ActiveJob: enqueue Event with serialized model state and request UUID
    Controller->>Analytics: after_action to send request event
    Analytics->>ActiveJob: enqueue Event with serialized request and request UUID
    Controller->>Client: 200 OK
    deactivate Controller
    ActiveJob->>ActiveJob: pump serialized Events to BigQuery
Loading

Dependencies

A Rails app with ActiveJob configured.

Installation

gem 'dfe-analytics'

then

bundle install

Configuration

1. Configure BigQuery connection, feature flags etc

bundle exec rails generate dfe:analytics:install

and follow comments in config/initializers/dfe-analytics.yml.

The dfe:analytics:install generator will also initialize some empty config files:

Filename Purpose
config/analytics.yml List all fields we will send to BigQuery
config/analytics_pii.yml List all fields we will obfuscate before sending to BigQuery. This should be a subset of fields in analytics.yml
config/analytics_blocklist.yml Autogenerated file to list all fields we will NOT send to BigQuery, to support the analytics:check task

2. Check your fields

A good place to start is to run

bundle exec rails dfe:analytics:regenerate_blocklist

to populate analytics_blocklist.yml. Work through this file to move entries into analytics.yml and optionally also to analytics_pii.yml.

Finally, run

bundle exec rails dfe:analytics:check

This will let you know whether there are any fields in your field configuration which are present in the model but missing from the config, or present in the config but missing from the model.

It's recommended to run this task regularly - at least as often as you run database migrations. Consider enhancing db:migrate to run it automatically.

3. Enable callbacks

Mix in the following modules. It's recommended to include them at the highest possible level in the inheritance hierarchy of your controllers and models so that they are effective everywhere. A standard Rails application will have all controllers inheriting from ApplicationController and all models inheriting from ApplicationRecord, so these should be a good place to start.

Controllers

class ApplicationController < ActionController::Base
  include DfE::Analytics::Requests

  # This method MUST be present in your controller and should return
  # either nil or an object implementing an .id method.
  #
  # def current_user; end

  # This method MAY be present in your controller. If so, it should
  # return a string - return value will be attached to web_request events.
  #
  # def current_namespace; end
end
Models
class ApplicationRecord < ActiveRecord::Base
  include DfE::Analytics::Entities
end

If everything has worked, you should see jobs flowing into your queues on each web request and model update. While you’re setting things up consider setting the config options async: false and log_only: true to take ActiveJob and BigQuery (respectively) out of the loop.

Importing existing data

Run

bundle exec rails dfe:analytics:import_all_entities

To reimport just one model, run:

bundle exec rails dfe:analytics:import_entity[ModelName]

Contributing

Make a copy of this repository, run bundle install, then bundle exec rspec to run the tests.

License

The gem is available as open source under the terms of the MIT License.

About

Emit standard analytics events from your Rails application

License:MIT License


Languages

Language:Ruby 92.1%Language:HTML 7.9%Language:CSS 0.0%