GDPR-Tools

Sanitize any PHP application HTML response to be GDPR-compliant.

Table of Contents

Installation
About GDPR-Tools
How Does It Work
Changelog
Documentation

^{If you like this project, giving it a ⭐ would be appreciated!}

Do you feel like reading? Check out the full API on the documentation website on
marwanalsoltany.github.io/gdpr-tools.

Key Features

Zero dependencies
Minimal, intuitive and easy to get along with
Integrates easily in any application

Installation

Using Composer:

composer require marwanalsoltany/gdpr-tools

Using Git:

git clone https://github.com/MarwanAlsoltany/gdpr-tools.git

Using Source:

Download GDPR-Tools as a .zip or .tar.gz and extract it where ever you like in you web server.

About GDPR-Tools

GDPR-Tools is a simple and a fast way that helps in making an application GDPR compliant. In short, GDPR is a set of rules and regulations that are designed to ensure that data is handled in a way that is compatible with the principles of the European Union's General Data Protection Regulation.

Why does GDPR-Tools exist?

Normally, if you are building a new application, you should make the application GDPR-compliant as you're building the app, but this is mostly not the case. If you have an application that is already built and you want to make it GDPR-compliant, you have to go through the code again to see which elements load external resources and try to implement a way to make them load after client consent. There are also other stuff like <iframe /> elements and other embeddable resources that get added by editors or plugins (in a CMS context for example). To make the app GDPR-compliant, all these resources must be blocked (depending on their category) until the user gives their consent.

Trying to block these resources in the client side using JavaScript is not possible because the browser does not give any possibility to stop/prevent requesting the external resource.

GDPR-Tools was created to solve that specific problem, make the HTML returned by the server side doesn't make any requests to any external resources (3rd-Party services) before user gives their consent in the client side. It does that by sanitizing all HTML elements that load external resources (scripts, stylesheets, images, etc.), the sanitization is done in the form of replacing the values of the attributes that load the resource with new values and setting the old values in data- attributes to be handled later in client side code.

Note: ~~GDPR-Tools takes currently care only of blocking requests to external resources.~~ Starting from v1.2.0, GDPR-Tools can also block inline scripts. Actually, it can block/modify any attributes as long as it constructed how to do that.

Fact: ~~GDPR-Tools is not a plug-and-play solution, it takes care only of the server side part, you still have to implement of the client side part. See consent.js to get started.~~ Starting from v1.2.0, GDPR-Tools also provides a client side SDK that can be integrated using simple config with any CMP.

How Does it Work?

You can use GDPR-Tools in three ways:

1) Using Some App Life-Cycle Event.

The first and the recommended way is to listen on some event that fires before sending the response back to the client. For example in a Symfony application, this would be the kernel.response event. Note that you have to make sure that GDPR-Tools listener is the last listener. The following code snippet demonstrates a slimmed down version of how to do that:

public function onKernelResponse(\Symfony\Component\HttpKernel\Event\ResponseEvent $event)
{
    $response = $event->getResponse();
    $content  = $response->getContent();

    $sanitizedContent = $this->sanitizedContent($content);

    $response->setContent($sanitizedContent);
}

private function sanitizedContent(string $content): string
{
    // the condition that determines whether to sanitize the content or not
    $condition = function ($data) {
         // only html responses
         // or you can also check for some consent cookie here
         return stripos($data, '<!DOCTYPE html>') !== false;
    };

    // the temporary URIs/URLs to set for the sanitized elements
    $uris = [
       'link'   => sprintf('data:text/css;charset=UTF-8;base64,%s', base64_encode('body::after{content:"Consent Please";color:orangered}')),
       'script' => sprintf('data:text/javascript;charset=UTF-8;base64,%s', base64_encode('console.warn("Script Blocked!")')),
       'iframe' => sprintf('data:text/html;charset=UTF-8;base64,%s', base64_encode('<div>Consent Please!</div>')),
    ];

    $whitelist = [
        'cdn.your-cmp.com',
        'unpkg.com',
        'cdnjs.cloudflare.com',
    ];

    // the data to append to the final html
    $appends = [
         'body' => [
             '<script defer src="/path/to/client-side-code.js"></script>',
         ],
    ];

    $sanitizedHTML = (new \MAKS\GDPRTools\Backend\Sanitizer())
         ->setData($content)
         ->setCondition($condition)
         ->setURIs($uris)
         ->setWhitelist($whitelist)
         ->setAppends($appends)
         ->sanitize()
         ->get();

    // or simply
    // $sanitizedHTML = (new \MAKS\GDPRTools\Backend\Sanitizer())->sanitizeData($content, $condition, $uris, $whitelist, $appends);

    return $sanitizedHTML;
}

2) Using a Custom Proxy for App Entry

The second way, is when you don't have the luxury of using events. In this case, you can simply proxy app entry point by making a new entry point that points to the old entry and makes use of MAKS\GDPRTools\Backend\Sanitizer::sanitizeApp() to sanitize the response before sending it back to the client. The following code snippet demonstrates a slimmed down version of how to do that:

// first, you need to rename the application entry point to something else,
// let's say `index.php` is the entry point, so `index.php` becomes `app.php`

// second, make a new file with the same name as the old name of app entry point
// in our case it's `index.php`, the content of this new file would be something like this:

include '/path/to/gdpr-tools/src/Backend/Sanitizer.php';

// check out the `$condition`, `$uris`, `$whitelist`, and `$appends` variables from the previous example
// you can also add `$prepends` (similar to `$appends`) and `$injections` (modes: 'prepend', 'append', 'before', 'after')
\MAKS\GDPRTools\Backend\Sanitizer::sanitizeApp('./app.php', $condition, $uris, $whitelist, $appends);

Hint: The \MAKS\GDPRTools\Backend\Sanitizer class is well documented, check out the DocBlocks of its properties and methods to learn more.

3) Using The Preconfigured PHAR Package as a Proxy

The third way is to use the PHAR-Archive, this is available since v1.2.0, and it is by far, the most simple one. The PHAR-Archive (gdpr-tools.phar) is a complete package that includes GDPR-Tools Backend and Frontend. You can use it to sanitize the response before sending it to the client using a simple config file (example gdpr-tools.config.php). The PHAR will sanitize the response (backend part) and build the necessary JavaScript code that integrates with the used CMP (frontend part) and attach it to the final response to handle the consent on the client-side. The following code snippet demonstrates how to do that:

// first, you need to download the PHAR archive (gdpr-tools.phar) from the releases page and add it in you web-server root directory

// second, you need to rename the application entry point to something else,
// let's say `index.php` is the entry point, so `index.php` becomes `app.php`

// third, make a new file with the same name as the old name of app entry point
// in our case it's `index.php`, the content of this new file would be something like this:

require './gdpr-tools.phar';

// finally, you need to add a config file (gdpr-tools.config.php)
// on the same level of (gdpr-tools.phar) configure it as needed

Hint: Check out the comments on gdpr-tools.config.php fields, to learn more about the expected data-types. Also check out gdpr-tools.php to see an example of how to use the Frontend SDK in addition to Backend Sanitization.

Note: You should configure your web-server and make sure that requests to the renamed app entry (index.php that became app.php) are redirected to the newly created app entry (/app.php redirects to or loads index.php).

What Elements are Sanitized?

By default these elements (and attributes) will be sanitized if they point to a resource that is NOT on the same domain as the application (not same-origin):

<link href="" />
<script src="" />
<iframe src="" />
<embed src="" />
<img src="" srcset="" />
<audio src="" />
<video src="" poster="" />
<source src="" srcset="" />
<track src="" />
<object data="" />

Check out \MAKS\GDPRTools\Backend\Sanitizer::ELEMENTS to see all the elements that are sanitized by default.

Starting from v1.2.0 with the introduction of the JavaScript SDK, you can also prevent inline scripts from running (Google Analytics script for example). Because there is not way to determine if an inline script is going to perform some action that requests and external resource and therefore requires user consent, the backend part in this case have to be done manually.

The prevention of executing the script can achieved by changing the script element to the following:

<!-- FROM -->
<script type="text/javascript">
    // JavaScript code ...
</script>

<!-- TO -->
<script type="text/blocked"
        data-consent-element="script"
        data-consent-attribute="type"
        data-consent-value="text/javascript"
        data-consent-category="marketing">
    // JavaScript code ...
</script>

<!-- TO (with an overlay) -->

<script type="text/blocked"
        data-consent-element="script"
        data-consent-attribute="type"
        data-consent-value="text/javascript"
        data-consent-category="marketing"
        data-consent-decorates="#selector"
        >
    // JavaScript code ...
</script>

When the script is added like the example above, it will not be executed until the user consents to the use of marketing cookies. The JavaScript SDK will evaluate the script as soon as the consent to the given category is given and add the data-consent-evaluated="true" attribute to denote that the script have been evaluated and the data-consent-alternative="text/blocked" to revert back the element if the consent is withdrawn (this will actually have no effect as the script is already executed). Additionally the data-consent-decorates attribute containing a query selector can optionally be added to teleport the decoration for this element elsewhere in the DOM.

Note: Inline blocked <script> elements must not have a DOMContentLoaded or load Event Listeners nor a jQuery $(document).ready(). The content must be simple JavaScript code or code that is wrapped within a normal or IIFE function. The Frontend SDK will take care of executing the script in the right time after the DOM is loaded and depending on whether a constent is given for the execution or not.

How Do the Sanitized Elements Look Like?

Each sanitized element will contain these attributes:

Attributes added in the Backend:
- data-consent-element:
  - The sanitized element tag name.
- data-consent-attribute:
  - The sanitized attribute name.
- data-consent-value:
  - The sanitized attribute value.
- data-consent-alternative:
  - The alternative attribute value that will be used instead of the original value (this attribute will be added in the frontend automatically if it was not specified in the backend).
- data-consent-original-{{ sanitizedAttribute:[href|src|srcset|poster|data] }} e.g. data-consent-original-src:
  - The original value of the sanitized attribute, this is useful when an element contains more than one sanitizable attribute (e.g. <video src="..." poster="...">), the second (data-consent-attribute) and third (data-consent-value) data-attributes will be overwritten when the second attribute is sanitized.
Attributes added in the Frontend:
- data-consent-category:
  - The sanitized element category.
- data-consent-decorator:
  - The sanitized element decorator (wrapper element) ID (available only on elements that are decorated).
- data-consent-evaluated:
  - The sanitized element evaluation state (available only on inline <script> elements).
Attributes with special functionality:
- data-consent-decorates:
  - The sanitized element decoratable element. This attribute is optional and can also be added at runtime. It has a very special use-case, it should contain a query selector which is used to teleport the decoration elsewhere in the DOM keeping the binding to the blocked element (normally, the blocked element will be decorated itself). This attribute is typically used with inline <script> elements that load some external resource and insert the result in a specific element on the page (for example Leaflet Map), in this case it makes no sense to decorate the blocked element (which is the <script> element in this case) but the element where the data is inserted (say for example an empty <div> with a specific ID). This attribute is availabe since v1.4.0.

If you want to name these attributes something else, you can provide custom names (name translations) using the \MAKS\GDPRTools\Backend\Sanitizer::$attributes static property on the backend and/or frontend.attributes in the config file or settings.attributes on the frontend.

Example of how it's done on the Backend:

\MAKS\GDPRTools\Backend\Sanitizer::$attributes = [
    'data-consent-element'      => 'data-gdpr-element',
    'data-consent-attribute'    => 'data-gdpr-attribute',
    'data-consent-value'        => 'data-gdpr-value',
    'data-consent-alternative'  => 'data-gdpr-alternative',
    'data-consent-original-src' => 'data-gdpr-original-src',
    // data-consent-original-(href|src|srcset|poster|data) ...
];

Example of how it's done in the config file:

gdpr-tools.config.php:

Example of how it's done in the Frontend:

AbstractCmpHelper.js

JavaScript SDK

The JavaScript SDK is pretty straightforward. You can either use the complied ConcreteCmpHelper class extend the AbstractCmpHelper class or ConcreteCmpHelper class to create your own CmpHelper class. The example bellow demonstrates how to use the shipped ConcreteCmpHelper class.

const config = {
  cookieName: 'CmpCookie',
  objectName: 'CmpObject',
  updateEventName: 'CmpObjectOnUpdate',
  functions: {
    showDialog: () => CmpObject.showDialog(),
    consentTo: (category) => CmpObject.consentTo(category),
    isConsentedTo: (category) => CmpObject.isConsentedTo(category),
  },
  settings: {
    attributes: {
      'data-consent-element':         'data-consent-element',
      'data-consent-attribute':       'data-consent-attribute',
      'data-consent-value':           'data-consent-value',
      'data-consent-alternative':     'data-consent-alternative',
      'data-consent-original-href':   'data-consent-original-href',
      'data-consent-original-src':    'data-consent-original-src',
      'data-consent-original-srcset': 'data-consent-original-srcset',
      'data-consent-original-poster': 'data-consent-original-poster',
      'data-consent-original-data':   'data-consent-original-data',
      'data-consent-category':        'data-consent-category',
      'data-consent-decorates':       'data-consent-decorates',
      'data-consent-decorator':       'data-consent-decorator',
      'data-consent-evaluated':       'data-consent-evaluated',
    },
    categories: [
      'necessary',
      'preferences',
      'statistics',
      'marketing',
      'unclassified',
    ],
    categorization: {
      necessary: [
        'google.com/recaptcha',
      ],
      preferences: [
        'cdn.jsdelivr.net',
      ],
      statistics: [
        'google-analytics.com',
      ],
      marketing: [
        'facebook.com',
        'twitter.com',
        'google.com',
        'youtube.com',
        'youtube-nocookie.com',
      ],
      unclassified: [],
    },
    decorations: [
      'iframe',
      'img',
      'audio',
      'video',
    ],
    messages: {
      overlayTitle:        'Content of "{service}" is being blocked due to insufficient Cookies configuration!', // {name} is alias for {service}
      overlayDescription:  'This content requires consent to the "{type}" cookies, to be viewed.', // {category} is alias for {type}
      overlayAcceptButton: 'Allow this category',
      overlayInfoButton:   'More info',
    },
    classes: {
      wrapper: '',
      container: '',
      element: '',
      overlay: '',
      overlayTitle: '',
      overlayDescription: '',
      overlayButtons: '',
      overlayAcceptButton: '',
      overlayInfoButton: '',
    }
  },
};

window.cmpHelper = (new ConcreteCmpHelper(config)).update();

Note: The JavaScript SDK has an active state, it will revert blocked elements attributes if the consent is withdrawn (re-add decorations to elements when consent is withdrawn and reload the resource if the consent is given again).

Extending The Frontend SDK

The *CmpHelper classes expose some properties to access the currently blocked elements and other useful information. For example, the wrapper is always created for the current viewport to best match the actual size of the element. Resizing the viewport might make the layout broken as ~~the wrapper is not automatically resized by default~~ starting from v1.4.0, the overlay updates upon window resize, to solve this issue, the following snippet can be used to resize the wrapper:

// this snippet is obsolete (>= 1.4.0), a more efficient version of this is not built-in natively
// this is only for demonstarion purposes

window.addEventListener('resize', () => {
    window.cmpHelper.elements.forEach(element => {
        if (element.dataset.hasOwnProperty('consentDecorator')) {
            // refresh the decoration
            window.cmpHelper.undecorate(element);
            window.cmpHelper.decorate(element);
        }
    })
});

// the following line may be needed depending on your case
window.dispatchEvent(new Event('resize'));

Also some useful events are fired throughout the life cycle of *CmpHelper classes to allow for hooking into them to perform some additional actions. For example, you may want to give a hint about the resource that is currently being blocked, the following snippet can be used to do that (starting from v1.4.0, the messages.overlayTitle contains the {service} placeholder which will inject the blocked URL hostname or the tag name for elements that does not load external resources):

window.addEventListener('CmpHelperElementOnDecorate', event => {
    const services = {
        'google': (url) => url.pathname.includes('/maps') ? 'Google Maps' : 'Google',
        'youtube': (url) => 'YouTube',
        // other services ...
    };

    const element    = event.detail.element;
    const decoration = event.detail.decoration;
    const url        = new URL(element.dataset.consentValue);
    const service    = url.hostname.split('.').reverse().at(1); // domain without TLD
    const owner      = services[service] ? services[service](url) : '"' + url.hostname + '"';

    decoration.overlayTitle.innerText = decoration.overlayTitle.innerText.replace('Content', `Content of ${owner}`);
});

Here is another example of how to make the *CmpHelper helper display messages in multiple languages:

// this event is fired very early in CmpHelper life-cycle,
// make sure it is registered before the CmpHelper is initialized
window.addEventListener('CmpHelperOnCreate', event => {
    const locale   = document.documentElement.lang.split(/([-_])/i).at(0).toLowerCase();
    const messages = {
        en: {
            overlayTitle: 'EN: Title',
            overlayDescription: 'EN: Description',
            overlayAcceptButton: 'EN: Accept',
            overlayInfoButton: 'EN: Info',
        },
        de: {
            overlayTitle: 'DE: Title',
            overlayDescription: 'DE: Description',
            overlayAcceptButton: 'DE: Accept',
            overlayInfoButton: 'DE: Info',
        },
        fr: {
            overlayTitle: 'FR: Title',
            overlayDescription: 'FR: Description',
            overlayAcceptButton: 'FR: Accept',
            overlayInfoButton: 'FR: Info',
        },
        // here more languages ...
    };

    event.detail.object.messages = messages[locale] ?? messages['en'];
});

Hint: The AbstractCmpHelper class is well documented, check out the DocBlocks of methods to learn more about the fired events (search for @fires).

Note: GDPR-Tools is meant to work with state-less apps, this means, it does not handle resources loaded dynamically (e.g. <iframe> that loads when a modal is opened) nor scripts loaded/executed by allowed elements. These cases have to be handled manually by extending the Frontend SDK using the availabe events or by extending *CmpHelper classes.

MarwanAlsoltany / gdpr-tools