peterbe / minimalcss

Extract the minimal CSS used in a set of URLs with puppeteer

Home Page:https://minimalcss.app/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Slow when HTML + CSS is large

peterbe opened this issue · comments

When I run:

▶ time node bin/minimalcss.js https://www.peterbe.com/plog-original.html -o /dev/null

It steadily takes about 6-7 seconds. That's way too slow. Let's see what we can do.

One thing I noticed was that there are a lot of low hanging fruit of re-using the "decision cache" if we just try to be a little smarter:

For example, that monster of a CSS file contains this:

.highlight .hll
.highlight
.highlight .c
.highlight .err
.highlight .k
.highlight .o
.highlight .cm
.highlight .cp
.highlight .c1
.highlight .cs
.highlight .gd
.highlight .ge
.highlight .gr
.highlight .gh
.highlight .gi
.highlight .go
.highlight .gp
.highlight .gs
...

There are about 60 of those. I know for a fact that there is document.querySelectorAll('.highlight').length===0 on that DOM. With that in mind there is no point trying document.querySelectorAll('.highlight .c') or document.querySelectorAll('.highlight .err'), etc.

One way to figure that out would be something like this:

if (selectorAsString.includes(' ')) {
  selectorParentAsString = selectorAsString.split(' ').slice(0, selectorAsString.split(' ').length - 1).join(' ');
  if (selectorParentAsString in decisionsCache) {
    if (!decisionsCache[selectorParentAsString]) {
      // No point continuing on!
      return false; // or, whatever we do when decide to NOT include it. 
    }
  }
}

Another thing I noticed was that, for this CSS, the function reduceCSSSelector was called 4,449 of which 377 are repeated. A simple memoize would fix that.

Bah! I implemented a solution. It looks like this:

            const selectorParentString = utils.selectorParentString(
              selectorString
            );
            if (selectorParentString !== null) {
              // Only proceed if the selector did have a parent.
              // E.g. selectorString was '.foo .bar'.
              // Now we can see if the decision cache has already concluded
              // that there is no '.foo' because if that's the case there's
              // no point proceeding to checking '.foo .bar'.
              if (selectorParentString in decisionsCache === false) {
                decisionsCache[
                  selectorParentString
                ] = isSelectorMatchToAnyElement(selectorParentString);
              }
              if (!decisionsCache[selectorParentString]) {
                // Indeeed! The parent was in the cache and it was concluded
                // that it is not matched to any element.
                decisionsCache[selectorString] = false;
              }
            }

I measured how many times the critical call gets called and the number went from 3,281 down to 2,793 and it didn't appear to make any difference in the total processing time.

So I experimented with three different scenarios. Ran it a bunch of times:

1. master branch untouched:

real	0m6.418s
user	0m3.956s
sys	0m0.267s

real	0m6.220s
user	0m3.767s
sys	0m0.255s

real	0m6.827s
user	0m4.296s
sys	0m0.288s

real	0m6.890s
user	0m4.260s
sys	0m0.281s

real	0m7.077s
user	0m4.526s
sys	0m0.291s

real	0m6.078s
user	0m3.612s
sys	0m0.262s

2. master branch but replace return dom(selectorString).length > 0; with return false:

real	0m3.804s
user	0m1.109s
sys	0m0.220s

real	0m3.568s
user	0m1.111s
sys	0m0.224s

real	0m3.715s
user	0m1.073s
sys	0m0.209s

real	0m3.704s
user	0m1.074s
sys	0m0.212s

real	0m3.752s
user	0m1.121s
sys	0m0.209s

real	0m3.762s
user	0m1.114s
sys	0m0.211s

3. My experimental branch with the selectorParentString hack and some memoizations:

real	0m5.979s
user	0m3.269s
sys	0m0.263s

real	0m5.991s
user	0m3.440s
sys	0m0.272s

real	0m5.694s
user	0m3.217s
sys	0m0.251s

real	0m6.086s
user	0m3.435s
sys	0m0.281s

real	0m5.790s
user	0m3.197s
sys	0m0.255s

real	0m5.634s
user	0m3.123s
sys	0m0.250s

Squintig at those numbers; it seems my hack gives us a 0.5 second speed boost. Not impressive. Need to find out what's really causing it to slow down.

Actually, I found that I can avoid even more expensive calls by splitting on ancestor selector too! E.g. from .ui.dropdown>.text to .ui.dropdown.
That reduced the total number of selectors that needs to be checked from 4,449 to 2,599.

Hoping that cheerio has a faster way to find out if an element exists at all: cheeriojs/cheerio#1265

@lahmatiy Hey, chatting here instead of Twitter.

I did some experiments and noticed that for a large CSS frameworks like Sematic-UI there are a LOT of CSS selectors like this:

.ui.comments > .reply.form {
  margin-top: 1em;
}
.ui.comments .comment .reply.form {
  width: 100%;
  margin-top: 1em;
}
.ui.comments .reply.form textarea {
  font-size: 1em;
  height: 12em;
}

So instead of doing document.querySelector('.ui.comments > .reply.form') and document.querySelector('.ui.comments .comment .reply.form') and document.querySelector('.ui.comments .reply.form textarea') (3 calls), I can just do this: document.querySelector('.ui.comments') and if that is falsy I can know there's no point bothering with .ui.comments ....

I store all lookups in an object. I call it the "decision cache". That helps me avoid doing too many document.querySelector(...) calls. E.g.

// Pseudo code
const decisions = {};

if (!document.querySelector('.ui.comments')) {
  decisions['.ui.comments > .reply.form'] = false;
  decisions['.ui.comments .comment .reply.form'] = false;
  decisions['.ui.comments .reply.form textarea'] = false;
}

At first I just did this: selector.split(/[>\s]+/); but that's too naive. For example, it totally doesn't work if the CSS seletor is .ui[class*='mobile reversed'].grid > .row. Hence my question in Twitter :)

hey @peterbe

thought you might be interested in a similar project of mine: https://github.com/leeoniya/dropcss

I also originally started with CSSTree, plus css-select and node-html-parser but could not get the performance i was expecting - a lot more back-story here: https://old.reddit.com/r/javascript/comments/bb7im2/dropcss_v100_an_exceptionally_fast_thorough_and/

it doesnt require Puppeteer, though you can easily use it if you need JS execution. i tried sending your html [1] and css [2] through and it finished in 130ms (no, that's not a typo). i originally came to give some advice, but then realized that there are optimizations all the way through the whole architecture of DropCSS (for which i had to ditch all deps), so i'm not sure how much it would help here : (

cheers!

[1] https://www.peterbe.com/plog-original.html
[2] https://www.peterbe.com/static/css/base.min.79787297dbf5.css

@peterbe You already parse a selector to AST, so you can split the selector (node list) by the first combinator if any, and then translate it to a string. You also can normalize a compound selector (e.g. .foo.bar and .bar.foo may became the same) and filter some parts (e.g. pseudos). I can write a code later if you need a help with it.

@leeoniya Did you try to switch off detailed parsing (anything except selectors)? It can boost CSS parsing.
To be honest, writing a CSS parser is not as simple as it looks like at first glance. Currently your solution fails in many cases. Some issues I found: https://codepen.io/anon/pen/zXdjNY). And that’s excluding escaping, string values in attribute selectors and content property, custom properties values, features that are coming soon (have no support by some or all browsers currently, e.g. :not(<selector-list>), nth-*(... of <selector-list>), :has(), :where(), nested rules etc). I believe that correctness should be over performance.

@lahmatiy

Did you try to switch off detailed parsing (anything except selectors)? It can boost CSS parsing.

yes i turned off a bunch or prelude parsing, and i used only visit: rule, you can see the earlier versions of DropCSS in its commit history/releases before v0.5.0. i don't think the parsing itself was the bottleneck, though it certainly helps to do it faster with much smaller data structures and less mem allocation. reducing the exponential growth of testing every selector against every html node obviously the major optimization - DropCSS has a similar decision cache for this.

Currently your solution fails in many cases. Some issues I found: https://codepen.io/anon/pen/zXdjNY).

i'll have to dig through that list and see what's worth handling, thanks!

And that’s excluding escaping, string values in attribute selectors and content property, custom properties values, features that are coming soon (have no support by some or all browsers currently, e.g. :not(), nth-*(... of ), :has(), :where(), nested rules etc).

the goal of DropCSS was to support the 99% use case, as is stated in the README. i've run it through my own CSS/HTML for several sizeable projects in addition to Bootstrap, Materialize, Semantic UI, Bulma, (and Tailwind with some meh pre-processing). That's a pretty wide swath of real-world cases. certainly it's easy to construct artificial but valid cases that trip it up, but this is design choice rather than an oversight.

I believe that correctness should be over performance.

for CSSTree - a CSS parser - of course correctness is number one, no one is arguing that :)

i'm excited to see how fast, thorough and compact you guys can get minimalcss while being fully spec compliant (both for html and css parsing).

quick rundown of https://codepen.io/anon/pen/zXdjNY

*|p{font-size:25px}

super obscure - like 0.0001% usage at best. probably won't bother.

@media all { p { color: blue }}
@media none { p { color: purple }}

this is left in by design. DropCSS does not evaluate media queries, and will leave any @media blocks that contain selectors which are found in html.

p{color:red;animation:2s foo infinite}

@keyframes infinite {
  0% { background: red }
  100% { background: white }
}
@keyframes foo {
  0% { background: green }
  100% { background: white }
}

not sure what you're expecting here (maybe that @keyframes infinite to purged?) @keyframes foo is incorrectly purged. dropcss only tests for animation names at beginning or end of the animation shorthand (again, the 99% case). it doesnt have a reserved keyword list for the animation:. i cannot imagine this super-loose property is fun for you to parse, either (who the hell comes up with this stuff, right?). the fact that DropCSS removes unused @keyframes and @font-face blocks is a bonus, since it's an intra-css optimization and is more in the land of clean-css, csso, etc.

div[id~=root]{font-weight:bold}

~= is not implemented, but also not widely used. should be trivial to add support for it though [1]. EDIT: now fixed

p:not(:nth-child(n+3)) { color: green }

there's the bug :) squashed!

[1] https://github.com/leeoniya/dropcss/blob/master/src/find.js#L23

@leeoniya One thing to keep in mind here is that there's a difference between performance and performance :)
Check out the conclusion here: #296 (comment)
The total time was 6 seconds and became 5 seconds after a bunch of optimization. That page/URL was and is extreme and 90% of the total time is waiting for Chrome (through puppeteer). I think that's almost all I/O bound work.

Suppose that csstree has some imperfect flaws that could be faster, I think that'd make it potentially go from 0.002s to 0.001s since it's most string work in Node and that's almost always fast enough.

Also, about 10+ years ago I wrote a tool called mincss which was written Python with regular expressions. It could only really look for whitespace and it never felt great. As more and more crazy CSS features come into the world, by regular expressions slowly faded away more and more. With that experience, I would always bet on a AST parser. Even if it's a little bit slow.

Another thing to ponder is that puppeteer does take a lot of time. Second, to that is the work of doing !!document.querySelectorAll(selector) but with cheerio. My work in #296 was ultimately to avoid those calls that can be avoided but there's still a lot that needs to be done and I wish it was faster. Know anything better than cheerio?

@lahmatiy

you can split the selector (node list) by the first combinator if any, and then translate it to a string

that's what I do here:

minimalcss/src/utils.js

Lines 62 to 70 in 3a4b93f

/**
* Given a string CSS selector (e.g. '.foo .bar .baz') return it with the
* last piece (split by whitespace) omitted (e.g. '.foo .bar').
* If there is no parent, return an empty string.
*
* @param {string} selector
* @return {string[]}
*/
function getParentSelectors(selector) {

I'm pretty pleased with the result, from the outside, of that function. It gets what I need. I'm curious if you can see anything horrible about it from the inside.

@lahmatiy

You also can normalize a compound selector (e.g. .foo.bar and .bar.foo may became the same)

I strongly doubt that's worth it. It would mean, if there's no DOM node called .foo.bar then we could cache that and not have to bother with .bar.foo but I don't think made CSS files have that inconsistency. ...much.

... and filter some parts (e.g. pseudos). I can write a code later if you need a help with it.

Not sure what you mean there.

Suppose that csstree has some imperfect flaws that could be faster, I think that'd make it potentially go from 0.002s to 0.001s since it's most string work in Node and that's almost always fast enough.

thankfully, we don't have to "suppose" when we can measure. here's what i get on my laptop (prior numbers were from desktop):

if i do a full "proper" parse using CSStree:

const fs = require('fs');
const csstree = require('css-tree');

const css = fs.readFileSync('base.min.79787297dbf5.css', 'utf8');

const start = +new Date();

let ast = csstree.parse(css);

console.log(+new Date() - start);

180ms

fully processing your CSS plus HTML with DropCSS:

const puppeteer = require('puppeteer');
const fetch = require('node-fetch');
const dropcss = require('../../dist/dropcss.cjs.js');
const fs = require('fs');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.peterbe.com/plog-original.html');
    const html = await page.content();
    const styleHrefs = await page.$$eval('link[rel=stylesheet]', els => Array.from(els).map(s => s.href));
    await browser.close();

    await Promise.all(styleHrefs.map(href =>
        fetch(href).then(r => r.text()).then(css => {
            let start = +new Date();

            let clean = dropcss({
                css,
                html,
            });

            console.log({
                stylesheet: href,
                cleanCss: clean.css,
                elapsed: +new Date() - start,
            });

            fs.writeFileSync('out.css', clean.css, 'utf8');
        })
    ));
})();

214ms

of this 214ms, CSS processing (which includes tokenization/parse) is 35ms.

Also, about 10+ years ago I wrote a tool called mincss which was written Python with regular expressions. It could only really look for whitespace and it never felt great. As more and more crazy CSS features come into the world, by regular expressions slowly faded away more and more. With that experience, I would always bet on a AST parser. Even if it's a little bit slow.

DropCSS does not use only regular expressions. also worth mentioning that most parsers will use numerous regular expressions in their tokenizers. I would never write a parser using just regexes especially when matching nested opening/closing braces is needed.

The total time was 6 seconds and became 5 seconds after a bunch of optimization. That page/URL was and is extreme and 90% of the total time is waiting for Chrome (through puppeteer). I think that's almost all I/O bound work....Another thing to ponder is that puppeteer does take a lot of time.

Again, this is easily quantifiable. Here is what Chrome's perf timeline shows (with JS disabled) when simply loading https://www.peterbe.com/plog-original.html. This excludes any I/O and is purely the work needed for parsing/layout/render. I'm not sure how much of this work is done in headless mode but i suspect all of it unless you have some very specific options set, assuming they exist at all.

peterbe com

Second, to that is the work of doing !!document.querySelectorAll(selector) but with cheerio. My work in #296 was ultimately to avoid those calls that can be avoided but there's still a lot that needs to be done and I wish it was faster. Know anything better than cheerio?

if cheerio is doing the selector parsing every time you use it, you're already starting considerably behind, since CSSTree would already have parsed it. this is why i rewrote DropCSS from its previous architecture, you need to have CSS and HTML systems that share information rather than being isolated. without having this you're going to be repeating the same work all over the place, even with aggressive caching strategies.

looks like cheerio uses the same underlying css-select dependency as DropCSS used before the rewrite. it was not built with performance in mind, i looked at its codebase when debugging and found that it does many things very inefficiently. i originally picked it for its vast selector support but had to ditch it due to its poor perf. i did not find anything that was both fast and had sufficient selector coverage so had to write it myself.

if you're using cheerio you should be using querySelector, not querySelectorAll so it bails after the first match.

That's soo cool. I'd love to experiment with making minimalcss depend on DropCSS. I love the idea of how minimalcss can get you the rendered DOM as a HTML string, including any DOM mutations (first) by things like window.onload but once it has the HTML strings (per URL) and the CSS hrefs, technically it could replace csstree+cheerio if it works out.

I'd love to see a rough patch to drop in DropCSS (no pun intended) and it could be a flag like

$ ./bin/minimalcss.js --use-dropcss https://www.url.com

Then you could do:

$ time ./bin/minimalcss.js --use-dropcss https://www.peterbe.com/plog-original.html -o /tmp/with.min.css
$ time ./bin/minimalcss.js https://www.peterbe.com/plog-original.html -o /tmp/without.min.css
$ prettier /tmp/without.min.css > /tmp/without.css
$ prettier /tmp/with.min.css > /tmp/with.css
$ diff -w /tmp/without.css /tmp/with.css

The crucial test would obviously be to test this against the big well-known CSS frameworks and see what their quirks do to the minimal css and what difference it makes ultimately.

Hint hint @leeoniya ^ :)

Truth be told, I got started making a hack to minimalcss so you could pass in a HTML string (stdin) or .html file and then use everything already in minimalcss except the puppeteer stuff. Didn't get very far due to time constraint. But I think, doing a thing that skips puppeteer+cheerio might be easier because it's all inside the guts. The API won't have to change.

Mind you, what the existing code does is that it when it opens pages, it the HTML strings aren't kept. Instead, they're put straight into cheerio objects are "cheerio DOMs" so you'd have to mess with that a little.

That's soo cool. I'd love to experiment with making minimalcss depend on DropCSS. I love the idea of how minimalcss can get you the rendered DOM as a HTML string, including any DOM mutations (first) by things like window.onload but once it has the HTML strings (per URL) and the CSS hrefs, technically it could replace csstree+cheerio if it works out.

what would be the value-add of --use-dropcss over the DropCSS/Puppeteer demo i posted above and the same advice in its docs? if you'd like to add this option to minimalcss, then certainly feel free to do so.

Hint hint @leeoniya ^ :)

i'm not sure i got this one :) did you want me to compare the diff between dropcss and minimalcss outputs of your site?

what would be the value-add

What I'm saying is that minimalcss current bundles two distinct pieces of functionality. A) getting the HTML B) analyzing the HTML.
It's kinda nuts that it's all baked into 1 thing but here we are.
I like all the work and effort that's gone into (A) but I wish (B) was something you can optionally swap in and out for other solutions.

In other words, today minimalcss is puppeteer + csstree+cheerio. What --use-dropcss would do is change that to puppeteer + dropcss.

if you'd like to add this option to minimalcss, then certainly feel free to do so.

I do. But struggling to find the time. It's easier to think and comment about it rather than writing a patch. It was my optimistic hope that you'd have the inclination but I totally understand if you don't.