HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community

Home Page:https://almanac.httparchive.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Investigate ability to store WASM binary files for analysis

tunetheweb opened this issue · comments

As discussed here: HTTPArchive/httparchive.org#416 binary files (including WASM binaries) are discarded as part of the HTTP Archive crawl. Last year we performed analysis on this binary files so presumably want to do that again this year. That analysis would be easier if we had the binary files from the crawl instead of having to go fetch them again manually like we did last year.

@pmeenan any more thoughts on this since last year?

FYI @rviscomi @siakaramalegos @ColinEberhardt

Here's the relevant WPT feature request: catchpoint/WebPageTest.agent#437

I'd be a lot more comfortable if we could do on-agent analysis (like we do with Images and fonts) instead of uploading the whole WASM. It's not used on a lot of sites so the impact might not be too crazy but WASM payloads can be HUGE, transpiled c++ projects like Adobe Photoshop.

Squoosh has ~2MB of WASM for example.

If we know what stats or info we'd like to extract (i.e. features used), I'd rather we build that into the python code directly and extract it rather than storing transpiled ffmpeg, etc.

Here's the repo @RReverser built to analyze the Wasm payloads, if it helps: https://github.com/RReverser/wasm-stats

Looks like it is largely grouping instructions by type and counting them (and a few other stats). That's FAR better done at the edge and just uploading the stats themselves.

A few ways to do it. Could port the rust code to python and update the agent itself or we could pre-build a linux binary of the existing code, bake it into the images and just run the current code against all WASM payloads on the edge.

I'd prefer a python port since it would be cross-platform and could be upstreamed directly into the mainline agent code but either works.

Looks like it is built on top of his wasmbin library so porting the whole thing to Python may be a bit much. Right now it expects a full directory of wasm files in a command-line param but it should be easy enough to change it to take a single wasm file as input and generate the json output.

We can build the binaries of the rust crate, ship the binaries with the agent and use it when running on Linux.

I should be able to fork it and play with it later this week (never done Rust before so there may be a bit of a learning curve but shouldn't be too bad).

Just added wasm-stats to the agent so each request in the HAR will have the stats for that request:

                "_wasm_stats": {
                    "funcs": 1777,
                    "instr": {
                        "total": 138519,
                        "proposals": {
                            "atomics": 0,
                            "ref_types": 0,
                            "simd": 0,
                            "tail_calls": 0,
                            "bulk": 0,
                            "multi_value": 0,
                            "non_trapping_conv": 0,
                            "sign_extend": 135,
                            "mutable_externals": 0,
                            "bigint_externals": 0
                        },
                        "categories": {
                            "load_store": 17683,
                            "local_var": 48800,
                            "global_var": 2979,
                            "table": 0,
                            "memory": 1,
                            "control_flow": 20751,
                            "direct_calls": 6948,
                            "indirect_calls": 725,
                            "constants": 20334,
                            "wait_notify": 0,
                            "other": 20298
                        }
                    },
                    "size": {
                        "code": 294477,
                        "init": 109812,
                        "externals": 1021,
                        "types": 567,
                        "custom": 0,
                        "descriptors": 1791,
                        "total": 407684
                    },
                    "imports": {
                        "funcs": 118,
                        "memories": 1,
                        "globals": 0,
                        "tables": 1
                    },
                    "exports": {
                        "funcs": 36,
                        "memories": 0,
                        "globals": 0,
                        "tables": 0
                    },
                    "custom_sections": [],
                    "has_start": false
                },

Sample test: https://dev.webpagetest.org/result/220419_XQ_1/

Closing this as we don't want to store the actual binaries. If we need new stats, we can update wasm-stats to collect them

Awestruck! Yet again!

+1 to that, thank you @pmeenan for integrating this with the test agents so quickly.

What would be the process if we wanted to add/change any wasm stats?

Someone submit a PR to the fork of the wasm-stats code with the desired changes. I can take care of building the binary and updating the agent but I'm not going to pretend I know how the actual stats code works :D