appcues / fast_html

Elixir/Erlang bindings for lexborisov's myhtml. THIS IS A MIRROR, real repo at https://git.pleroma.social/pleroma/elixir-libraries/fast_html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FastHTML

A C Node wrapping lexborisov's myhtml. Primarily used with FastSanitize.

  • Available as a hex package: {:fast_html, "~> 2.0"}
  • Documentation

Benchmarks

The following table provides median times it takes to decode a string to a tree for html parsers that can be used from Elixir. Benchmarks were conducted on a machine with an AMD Ryzen 9 3950X (32) @ 3.500GHz CPU and 32GB of RAM. The mix fast_html.bench task can be used for running the benchmark by yourself.

File/Parser fast_html (Port) mochiweb_html (erlang) html5ever (Rust NIF) Myhtmlex (NIF)¹
document-large.html (6.9M) 125.12 ms 1778.34 ms 395.21 ms 327.17 ms
document-medium.html (85K) 1.93 ms 12.10 ms 4.74 ms 3.82 ms
document-small.html (25K) 0.50 ms 2.76 ms 1.72 ms 1.19 ms
fragment-large.html (33K) 0.93 ms 4.78 ms 2.34 ms 2.15 ms
fragment-small.html² (757B) 44.60 μs 42.13 μs 43.58 μs 289.71 μs

Full benchmark output can be seen in this snippet

  1. Myhtmlex has a C-Node mode, but it wasn't benchmarked here because it segfaults on document-large.html
  2. The slowdown on fragment-small.html is due to Port overhead. Unlike html5ever and Myhtmlex in NIF mode, fast_html has the parser process isolated and communicates with it over stdio, so even if a fatal crash in the parser happens, it won't bring down the entire VM.

Contribution / Bug Reports

  • Please make sure you do git submodule update after a checkout/pull
  • The project aims to be fully tested

About

Elixir/Erlang bindings for lexborisov's myhtml. THIS IS A MIRROR, real repo at https://git.pleroma.social/pleroma/elixir-libraries/fast_html

License:GNU Lesser General Public License v2.1


Languages

Language:HTML 99.5%Language:Elixir 0.3%Language:C 0.2%Language:Makefile 0.0%