1904labs / dom-to-image-more

Generates an image from a DOM node using HTML5 canvas

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add `onclone` size optimization for inline styles

zm-cttae opened this issue · comments

commented

Use case: description, code

In an ideal world we want to optimize SVG output by removing styles that can be inherited.

This mainly has an impact at >>1K nodes where the output is in the 1MB scale of magnitude.

There's not much of a story here at the moment, but the library did previously lean on unset until the #90 regression:

function copyUserComputedStyle(sourceElement, sourceComputedStyles, targetElement, root) {
const targetStyle = targetElement.style;
const inlineStyles = sourceElement.style;
for (let style of sourceComputedStyles) {
const value = sourceComputedStyles.getPropertyValue(style);
const inlineValue = inlineStyles.getPropertyValue(style);
inlineStyles.setProperty(style, root ? 'initial' : 'unset');
const initialValue = sourceComputedStyles.getPropertyValue(style);
if (initialValue !== value) {
targetStyle.setProperty(style, value);
} else {
targetStyle.removeProperty(style);
}
inlineStyles.setProperty(style, inlineValue);
}
}

There's two strategies I tried to research:

  • The curveball method: reimplement CSSOM in the "cloning" stage using document.styleSheets
    • Multiple PITAs and footguns involved (specificity, source order and the size of this library)
    • Inheritance is done with a while loop on .parentNode, and a hardcoded list of inheritable CSS props
  • The easier, "just as good" way: a post-process op in toSvg
    • Append the output to our iframe sandbox
    • use a TreeWalker to iterate over the nodes
    • see if removing an inline value changes the computed style

Some starter code, but this code snippet:

  1. doesn't restore CSS props in place, making it indeterministic and liable to break styling
  2. doesn't go in ascending order up the DOM tree, thus removing inherited properties that are necessary
const treeWalker = document.createTreeWalker(document.querySelector('foreignObject > *'), NodeFilter.SHOW_ELEMENT)
const elementList = [];
let currentNode = treeWalker.currentNode;

while (currentNode) {
    elementList.push(currentNode);
    currentNode = treeWalker.nextNode();
}

elementList.forEach(function(element) {
    const inlineStyles = element.style;
    const computedStyles = getComputedStyle(element);
    util.asArray(inlineStyles).forEach(function(name) {
        if (inlineStyles.cssText.includes(name + ': ' + value + ';')) {
            const value = inlineStyles.getPropertyValue(name);
            inlineStyles.removeProperty(name);
            if (value !== computedStyles.getPropertyValue(name)) {
                inlineStyles[name] = value;
            }
        }
    });
});

I'm really steering away from this change right now. Really do NOT want to delve into the very fragile territory that it brings trying to white-list things. I will add a override path for the SVG path to the impl.utils so you can monkey-patch in something to build upon while we figure things out...

commented

Sounds all right. Happy hols! I imagine the impl.utils solution will keep all happy.


Wrote filterWinningInlineStyles algorithm that's "good enough", for any interested. Expand if you want to see a small wall of code.

This code expects the iframe/SVG window to be of the same width as the original document.

Optimizations

  1. When traversing DOM tree of node, group nodes by descending node depth.

    CSS inheritance is computed on the DOM tree via preorder traversal and is additive-cumulative (increases styling data). For the filter op which is subtractive, we want to traverse the tree in the opposite direction.

    The algorithm sorts elements in the node tree by descending node depth. (This is known as reverse level order traversal.)

    This gives us a 30% to 40% speed boost. This also ensures declarations are only removed when they really can be inherited.

  2. When filtering each inline style declaration by computed effect, go for the most hyphenated properties first.

    In CSS, shorthands consistently have less hyphens than their longhand. We want to filter out scenarios where a CSS property matches their shorthand, e.g. block-size -> height or border-color -> border.

    The algorithm does a radix sort with bitmasks for standard, custom and vendored proprties, then subsorts descending hyphen count.

    In tests this filtered another 50% of inline styling. We also get a 20-40% speed boost because we're not setting as many properties back.

/* eslint no-implicit-globals: "error" */
(function(global) {
    const contentElement = global.document.body || global.document.documentElement;
    const cssBlockCommentRegex = /\/\*[^*]*\*+([^/*][^*]*\*+)*\//g;
    const cssDeclarationColonRegex = /;\s*(?=-*\w+(?:-\w+)*:\s*(?:[^"']*["'][^"']*["'])*[^"']*$)/g;

    /**
     * Filter inline style declarations for a DOM element tree by computed effect.
     * Estimated inline style reduction at 80% to 90%.
     *
     * @param {HTMLElement} clone
     *      HTML clone with styling from inline attributes and embedded stylesheets only.
     *      Expects fonts and images to have been previously embedded into the page.
     * @returns {Promise<HTMLElement>}
     *      A promise that resolves to the `clone` reference, now stripped of inline styling
     *      declarations without a computed effect.
     */
    global.dominlinestylefilter = function(clone) {
        const context = new Context(clone);
        return new Promise(stageCloneWith(context))
            .then(collectTree)
            .then(sortAscending)
            .then(multiPassFilter)
            .then(unstageClone);
    };

    /**
     * Synchronous version of {@link onclone}.
     * @param {HTMLElement} clone
     * @returns {HTMLElement}
     */
    dominlinestylefilter.sync = function(clone) {
        let context = new Context(clone);
        try {
            let value = execute(stageCloneWith(context));
            [collectTree, sortAscending, multiPassFilter, unstageClone]
                .forEach(function(fn) { value = fn(value) });
            return value;
        } catch(e) {
            unstageClone(context);
            throw e;
        }
    };

    /**
     * Process context to propogate in promise chain.
     * @param {HTMLElement} clone
     *      Node with all computed styles dumped in the inline styling.
     * @constructor
     */
    function Context(clone) {
        this.root = clone;
        this.sibling = clone.nextSibling;
        this.parent = clone.parentElement;
        this.sandbox = null;
        this.self = null;
        this.tree = null;
        this.pyramid = null;
        this.delta = null;
    }

    /**
     * Styling data for a HTML element.
     * @param {HTMLElement} element Element in the DOM tree of clone.
     * @param {Context} context
     * @constructor
     */
    function Styles(element, context) {
        this.inline = element.style;
        this.computed = context.sandbox.contentWindow.getComputedStyle(element);
    }

    /**
     * Promise executor function.
     * @typedef {(resolve: (value: Context) => void, reject: (reason?: string) => void) => void} Executor
     */

    /**
     * Synchronously execute a promise executor function.
     * @param {Executor} executor
     */
    function execute(executor) {
        let result;
        const resolver = (value) => { result = value; };
        const rejector = (reason) => { throw new Error(reason); };
        executor(resolver, rejector);
        return result;
    }

    /**
     * Creates a hidden, rendered sandbox <iframe> so we can insert & render the clone,
     * process computed styles and run the compression algorithm.
     * To make sure we match the styling, the iframe uses the original viewport
     * dimensions and document character set.
     *
     * @returns {HTMLIFrameElement} {@link Context.sandbox}.
     */
    function createSandbox() {
        const iframe = global.document.createElementNS('http://www.w3.org/1999/xhtml', 'iframe');
        iframe.style.visibility = 'hidden';
        iframe.style.position = 'fixed';
        iframe.width = global.innerWidth;
        iframe.height = global.innerHeight;
        iframe.sandbox.add('allow-same-origin');
        contentElement.appendChild(iframe);
        const charsetDeclaration = iframe.contentDocument.createElement('meta');
        charsetDeclaration.setAttribute('charset', global.document.characterSet || 'UTF-8');
        iframe.contentDocument.head.appendChild(charsetDeclaration);
        iframe.contentDocument.title = iframe.id = 'dominlinestylefilter-sandbox';
        return iframe;
    }

    /**
     * Generates a new sandbox DOM for the process context and appends the clone to it.
     *
     * @param {Context} context
     * @returns {Executor} Promise executor function.
     */
    function stageCloneWith(context) {
        return (resolve, reject) => {
            context.sandbox = createSandbox();
            context.self = context.sandbox.contentWindow;
            context.self.document.body.appendChild(context.root);
            if (!context.sandbox.parentElement) reject('failed to append sandbox iframe to DOM');
            resolve(context);
        };
    }

    /**
     * Generates a list of elements in the DOM tree of `context.clone`, sorted by descending position (source order).
     * @param {Context} context
     */
    function collectTree(context) {
        const walker = context.self.document.createTreeWalker(context.root, global.NodeFilter.SHOW_ELEMENT);
        context.tree = [walker.currentNode];
        let clone;
        while ((clone = walker.nextNode())) context.tree.push(clone);
        return context;
    }

    /**
     * Sorts elements into ascending DOM tree ("pyramid") like the native CSSOM, because CSS inheritance is bottom-up.
     * Elements are sorted by ascending order of succession from the root then descending source order.
     * Without the sort algorithm, explicit style values matching `inherit` are removed.
     * @param {Context} context
     */
    function sortAscending(context) {
        let depth = 1, highest = depth;
        const stack = [];
        const depths = context.tree.map(function(element, i) {
            const previous = this[i - 1] || null;
            if (element.parentElement === previous) {
                depth += 1;
                stack.push(i - 1);
                if (depth > highest) highest = depth;
                return depth;
            }
            if (element.previousElementSibling === previous) {
                return depth;
            }
            let parentIndex = stack.pop();
            while (typeof parentIndex === 'number' && element.parentElement !== this[parentIndex]) {
                depth -= 1;
                parentIndex = stack.pop();
            }
            return depth;
        }, context.tree);
        context.pyramid = [];
        for (let depth = highest; depth > 0; depth--) {
            for (let i = 0; i < context.tree.length; i++) {
                if (depths[i] === depth) context.pyramid.push(context.tree[i]);
            }
        }
        context.tree = null;
        return context;
    }

    /** Multi-pass inline CSS data optimization. */
    function multiPassFilter(context) {
        context.pyramid.forEach(stripBlockComments);
        context.root.querySelectorAll('style').forEach(filterWinningMediaQueries);
        while (context.delta !== 0) {
            context.delta = 0;
            context.pyramid.forEach(filterWinningInlineStyles.bind(context));
        }
        return context;
    }

    /**
     * Strip block comments from inline style.
     */
    function stripBlockComments(element) {
        const value = element.getAttribute('style');
        if (!cssBlockCommentRegex.test(value)) return;
        element.setAttribute('style', value.replace(cssBlockCommentRegex, ''));
    }

    /**
     * Strip inactive media queries from embedded stylesheets.
     */
    function filterWinningMediaQueries(style) {
        if (style.media && !global.matchMedia(style.media).matches) {
            style.parentElement.removeChild(style);
            return;
        }
        if (!style.textContent.includes('@media')) return;

        const mediaRuleRegex = /@media([^{]+)\{((?:(?!\}\s*\})[\s\S])*\})\s*\}/;
        let mediaRuleMatch;
        while ((mediaRuleMatch = mediaRuleRegex.exec(style.textContent))) {
            const conditionText = mediaRuleMatch[1].trim();
            const cssText = mediaRuleMatch[2].trim();
            if (global.matchMedia(conditionText).matches) {
                style.textContent = style.textContent.replace(mediaRuleMatch[0], cssText);
            } else {
                style.textContent = style.textContent.replace(mediaRuleMatch[0], '');
            }
        }
    }

    /**
     * Exploratory filter to reduce an inline style to winning declarations (<2ms / element).
     * Destructively remove declarations and check if there is a computed value change. If so, restore.
     *
     * @param {HTMLElement} element
     *      Element in the DOM tree of `clone`.
     * @this {Context}
     */
    function filterWinningInlineStyles(element) {
        if (!element.attributes.style) return;

        const styles = new Styles(element, this);
        this.delta += styles.inline.cssText.length;

        // Hack to disable dynamic changes in CSS computed values.
        // Prevents false positives in the declaration filter.
        const animations = { 'animation-duration': '', 'transition-duration': '' };
        for (const name in animations) {
            animations[name] = styles.inline.getPropertyValue(name);
            if (animations[name]) styles.inline.setProperty(name, '0s');
        }

        // Splice explicit inline style declarations without a computed effect in place.
        // By prioritising standard CSS properties & lots of hyphens, we reduce attack time & perf load.
        tokenizeCssTextDeclarations(styles.inline.cssText)
            .map(getCssTextProperty)
            .sort(compareHyphenCount)
            .forEach(spliceCssTextDeclaration.bind(styles));

        // Restore dynamic CSS properties.
        for (const name in animations) if (animations[name].length) styles.inline.setProperty(name, animations[name]);

        this.delta -= styles.inline.cssText.length;

        if (element.getAttribute('style') === '') element.removeAttribute('style');
    }

    /**
     * Tokenize inline styling declarations.
     *
     * @param {string} cssText
     *      Inline style attribute value for a HTML element.
     * @returns {string[]}
     *      List of inline styling declarations.
     */
    function tokenizeCssTextDeclarations(cssText) {
        return cssText.replace(/;\s*$/, '').split(cssDeclarationColonRegex);
    }

    /**
     * Get property name from CSS declaration.
     *
     * @param {string} declaration
     *      Inline style declaration for a HTML element.
     * @returns {string}
     *      The CSS property for `declaration`.
     */
    function getCssTextProperty(declaration) {
        return declaration.slice(0, declaration.indexOf(':'));
    }

    /**
     * Sorts an array of CSS properties by the number of hyphens, keeping vendored prefixes last.
     * Optimize for compression gains and early hits by sending shorthand, vendored and custom properties last.
     *
     * @param {string} a
     *      First CSS property name.
     * @param {string} b
     *      Second CSS property name.
     * @returns {number}
     *      See {@link Array.prototype.sort}.
     */
    function compareHyphenCount(a, b) {
        const isCustom = (name) => /^--\b/.test(name);
        const isVendored = (name) => /^-\b/.test(name);

        return (
            (isCustom(a) & !isCustom(b)) * 0b1000000 |
            (isVendored(a) & !isVendored(b)) * 0b0100000 |
            Math.max(a.split('-').length - b.split('-').length, 0b0011111)
        );
    }

    /**
     * Filters style declarations in place to keep the algorithm deterministic.
     * The styles dumped by `copyUserComputedStyleFast` are position-dependent.
     *
     * @param {string} name
     *      Name of the CSS property explicitly declared in the inline styling.
     * @this {Styles}
     */
    function spliceCssTextDeclaration(name) {
        if (name === 'width' || name === 'height') return; // cross-browser portability
        if (name === 'animation-duration' || name === 'transition-duration') return; // dynamic properties

        const value = this.inline.getPropertyValue(name);
        const declarations = tokenizeCssTextDeclarations(this.inline.cssText);
        const index = declarations.findIndex(d => name === getCssTextProperty(d));
        if (index === -1) return;

        this.inline.cssText = declarations.filter((_, i) => i !== index).join('; ') + ';';
        if (value === this.computed.getPropertyValue(name)) return;
        this.inline.cssText = declarations.join('; ') + ';';
    }

    /**
     * Detaches clone element from the sandbox, then deletes the sandbox <iframe>.
     * @param {Context} context
     */
    function unstageClone(context) {
        if (context.parent) {
            context.parent.insertBefore(context.root, context.sibling);
        }
        if (context.sandbox && context.sandbox.parentElement) {
            contentElement.removeChild(context.sandbox);
        }
        if (context.sandbox) {
            context.self = null;
            context.sandbox = null;
        }
        context.tree = null;
        return context.root;
    }

}(globalThis))

Thanks for giving a shot at trying to implement this!

I agree with @IDisposable. I think this algorithm is getting too complex for it to be feasible to maintain as part of the library. Users who desire better SVG size optimizations should override the behavior on their own. We don't want to maintain code that will have to be frequently updated to keep up with the spec.

A comment on the robust-ness on this algorithm: The algorithm looks good so far. As I understand it, it tries removing the property on an SVG clone of the DOM tree, to see if the computed value changes. If the computed value stays the same, the property can be removed. We process children elements before their parents.

I would like to know if you have tested if removing the property with the given name (in filterWinningInlineStyles) can affect other properties on element, or affect the property on the children of element. Because some styles are derived from other styles, e.g. setting display:flex; on an element changes the styles that are set on children elements, I'm worried this could cause unintended repercussions. It seems to be more risky than copyUserComputedStyleFast, which was only removing styles that matched the default as well as the parent's inherited value.

commented

Since my last comment there was a gotcha (removing explicit width and height caused a content jump in larger screenshots). Not filtering them (as this library has done) fixed that.

Closing this as we can now package up this filter and users can insert it in the module using the options.onclone function.


I would like to know if you have tested if removing the property with the given name (in filterWinningInlineStyles) can affect other properties on element, or affect the property on the children of element.

I don't have data to present, but as I understand it, the filter isn't aggressive or behaviour-dependent enough to produce this footgun. The user agent stylesheets also avoid flex and grid.

One day later: realised that CSS properties are qualified like -this, so now the filter optimises by targeting the properties with more hyphens first (thus leaving more shorthands on the page).

commented

Reopening as I don't want to be the one to maintain an NPM package.

I've updated #92 (comment) for options.onclone compatibility.

Here is a possible README for a dom-inline-style-filter package:
# `dom-inline-style-filter`

`dom-inline-style-filter` library filters inline style declarations for a standalone DOM element tree by computed effect.

- As web developers, we would like elements that ship only with inline styling to be light.
- A main use case of this is SVG screenshots of HTML elements.
- Even after an algorithm to [filter out user agent styling when inlining the style](https://github.com/1904labs/dom-to-image-more/issues/70), there is some way to go with data size.

## Usage

### `dominlinestylefilter(node)`

**Parameter:** `node` - a `HTMLElement` with all style rules embedded as inline style attributes or `<style>` tags.

**Returns:** a `Promise` that resolves to `node`. Within `node`, all inline styling has been filtered to the minimum declarations that produce the same computed style.

### `dominlinestylefilter.sync(node)`

Synchronous version. Returns `node` when the styling compression is completed.

## Optimizations

1.  **When traversing DOM tree of `node`, group nodes by descending node depth.**

    CSS inheritance is computed on the DOM tree via preorder traversal and is additive-cumulative (increases styling data). For the filter op which is subtractive, we want to traverse the tree in the opposite direction.
    
    The algorithm sorts elements in the `node` tree by descending node depth. (This is known as reverse level order traversal.)

    This gives us a 30% to 40% speed boost. This also ensures declarations are only removed when they really can be inherited.

2.  **When filtering each inline style declaration by computed effect, go for the most hyphenated properties first.**

    In CSS, shorthands consistently have less hyphens than their longhand. We want to filter out scenarios where a CSS property matches their shorthand, e.g. `block-size` -> `height` or `border-color` -> `border`.

    The algorithm does a radix sort with bitmasks for standard, custom and vendored proprties, then subsorts by descending hyphen count.

    In tests this filtered another 50% of inline styling. We also get a 20-40% speed boost because we're not setting as many properties back.

## Performance

This data is collected from manual testing on the output of the `domtoimage.toSvg` function in the `dom-to-image-more` NPM package.

| Wikipedia article demo    | Value                                |
| :------------------------ | :----------------------------------- |
| Number of nodes           | 5894 nodes                           |
| Initial declaration count | 128581 (21.8 declarations / node)    |
| Pre-compression bytes     | 3.77mb                               |
| Reductions                | [2924992, 257746, 87120, 0]          |
| Processing time           | 9642.8ms (1.64 ms/node)              |
| Total reduction           | 3.27mb                               |
| Output declaration count  | 15853 (2.69 / node)                  |
| Post-compression bytes    | 504.8kb                              |
| Compression quotients     | [0.2252, 0.6967, 0.8525]             |
| Total quotient            | 0.1337                               |
| Decay formula             | $$exp(-2.3x)$$                       |

| Code screenshot demo      | Value                                |
| :------------------------ | :----------------------------------- |
| Number of nodes           | 468 nodes                            |
| Initial declaration count | 11397 (24.4 declarations / node)     |
| Pre-compression bytes     | 373608b                              |
| Reductions                | [292044, 34152, 0]                   |
| Processing time           | 382ms (0.8 ms / node)                |
| Total reduction           | 326196b                              |
| Post-compression bytes    | 47412b                               |
| Output declaration count  | 1777 (3.78 / node)                   |
| Compression quotients     | [0.2183, 0.5813]                     |
| Total quotient            | 0.1269                               |
| Decay formula             | $$exp(-2.2x)$$                       |

(Note that optimization 1 handles parent styling clashes. Optimization 2, with width-height filter whitelisting, should prevent properties that interact from being removed unfairly (plus we are only remvong the inline style property if - once removed - the inline value matches the computed value exactly - a more narrow criterion than checking if computed values all stayed the same.) Plus most of what we want to filter out is the extra longhand properties that the algo attacks first.

I will be working on this next.

commented

I have extracted the solution design here into a standalone package that's usable as we speak :)

Thanks for doing that... I've been hacking in .Net land of late and haven't been able to address much here... might be able to sweep this into a build soonish, but I love that you've made a package to allow others to benefit from it now!

That package.json should probably declare a dependency on some base-version of dom-to-image-more... :)