worker-tools / parsed-html-rewriter

A DOM-based implementation of Cloudflare Worker's HTMLRewriter.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsed HTML Rewriter

A DOM-based implementation of Cloudflare Worker's HTMLRewriter.


UPDATE: While this module works just fine, I've made a new verison that is WASM/streaming based for much better performance.


Unlike the original, this implementation parses the entire DOM (provided by linkedom), and runs selectors against this representation. As a result, it is slower, more memory intensive, and can't process streaming data.

Note that this approach was chosen to quickly implement the functionality of HTMLRewriter, as there is currently no JS implementation available. A better implementation would replicate the streaming approach of lol-html, or even use a WebAssembly version of it. Update: Now available here.

However, this implementation should run in most JS contexts (including Web Workers, Service Workers and Deno) without modification and handle many, if not most, use cases of HTMLRewriter. It should be good enough for testing and offline Workers development.

Usage

This module can be used in two ways.

As a standalone module:

import { ParsedHTMLRewriter } from '@worker-tools/parsed-html-rewriter'

await new ParsedHTMLRewriter()
  .transform(new Response('<body></body>'))
  .text();

Or as a polyfill:

import '@worker-tools/parsed-html-rewriter/polyfill'

await new HTMLRewriter() // Will use the native version when running in a Worker
  .transform(new Response('<body></body>'))
  .text();

innerHTML

Unlike the current (March 2021) version on CF Workers, this implementation already supports the proposed innerHTML handler. Note that this feature is unstable and will likely change as the real version materializes.

await new HTMLRewriter()
  .on('body', {
    innerHTML(html) {
      console.log(html) // => '<div id="foo">bar</div>'
    },
  })
  .transform(new Response('<body><div id="foo">bar</div></body>'))
  .text();

Caveats

  • Because this version isn't based on streaming data, the order in which handlers are called can differ. Some measure have been taken to simulate the order, but differences may occur.
  • Texts never arrive in chunks. There is always just one chunk, followed by an empty one with lastInTextNode set to true.

This module is part of the Worker Tools collection
⁕

Worker Tools are a collection of TypeScript libraries for writing web servers in Worker Runtimes such as Cloudflare Workers, Deno Deploy and Service Workers in the browser.

If you liked this module, you might also like:

  • 🧭 Worker Router --- Complete routing solution that works across CF Workers, Deno and Service Workers
  • πŸ”‹ Worker Middleware --- A suite of standalone HTTP server-side middleware with TypeScript support
  • πŸ“„ Worker HTML --- HTML templating and streaming response library
  • πŸ“¦ Storage Area --- Key-value store abstraction across Cloudflare KV, Deno and browsers.
  • πŸ†— Response Creators --- Factory functions for responses with pre-filled status and status text
  • 🎏 Stream Response --- Use async generators to build streaming responses for SSE, etc...
  • πŸ₯ JSON Fetch --- Drop-in replacements for Fetch API classes with first class support for JSON.
  • πŸ¦‘ JSON Stream --- Streaming JSON parser/stingifier with first class support for web streams.

Worker Tools also includes a number of polyfills that help bridge the gap between Worker Runtimes:

  • ✏️ HTML Rewriter --- Cloudflare's HTML Rewriter for use in Deno, browsers, etc...
  • πŸ“ Location Polyfill --- A Location polyfill for Cloudflare Workers.
  • πŸ¦• Deno Fetch Event Adapter --- Dispatches global fetch events using Deno’s native HTTP server.

Fore more visit workers.tools.

About

A DOM-based implementation of Cloudflare Worker's HTMLRewriter.

License:MIT License


Languages

Language:TypeScript 56.4%Language:JavaScript 43.6%