krausest / js-framework-benchmark

A comparison of the performance of a few popular javascript frameworks

Home Page:https://krausest.github.io/js-framework-benchmark/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Note: Implementation uses manual DOM manipulations [sticky]

krausest opened this issue · comments

These implementations use direct DOM modification in the end user code. This means specific DOM updating code is written tailored for these specific tests.
Those implementation are expected to perform very close to vanillajs (which consists of manual DOM manipulations).

I believe the WASM implementations like wasm-bindgen and stdweb are in this category as well.

Worth mentioning domdiff (it's the underlying reconciler behind uhtml, lighterhtml, hyperhtml). I think stage0 is worth consideration too, but it is bit more debatable. These libraries have a reconciler atleast but there are not "bindings" just wiring up their own DOM operations in the open. Worth atleast taking a look.

This benchmark should not force any programming style, it should just be a framework for exercising different libraries that perform certain functionality.

If you want to consider the "effort to write the code", there are better ways to measure that. You can add "lines of code" of the implementation as a measure that shows how much code a developer will need to write to do the included functionalities.

As I mentioned in the jQuery issue, this benchmark should allow all types of libraries. People can decide to use this library or not based on the performance as well as the lines of code (a measure of effort). The personal preference for syntax or style should not be covered here.

I think lines of code are poor metric for effort. I can write very concise code that is hard to reason about yet is very performant. Or I can write Elm which is easy to write but has a disproportional amount of LoCs. That being said people can just look at the implementation and make their own judgement call assuming they do equivalent things.

The problem with non-data driven approaches is they become ultimately meaningless from a comparison perspective outside serving as a guidepost for optimal performance. The scenario presented here is an approximation of a category of the solution but shouldn't represent a specific one. Let me explain why this matters. I think there are 3 factors at play.

Having knowledge of the specific problem allows certain types of optimizations that would make no sense for a general solution. However, restricting these lower-level libraries makes no sense as there is no reason jQuery should adopt React's abstraction. It's an artificial constraint.

Conversely, every library has an escape-hatch back to the VanillasJS. So an implementor could just grab a ref and wire everything up by hand. Sure it bypasses the library but it is no more effort or complex than the VanillaJS implementation. We've seen that in implementations here. It is a tempting approach to get performance. One could argue it is even idiomatic. But any library could do the same thing.

We've gotten to a place where declarative libraries are knocking on the door of vanilla performance anyway. So it's a shrinking zone, since I'm not even sure what the idiomatic jQuery implementation looks like anymore. How Vanilla is it?

I am a little curious how jQuery performs these days myself and we have a enough reference builds they could compare against themselves, maybe it is worth a go. Like how does compare against optimal WASM written in Rust. But without some common ground these stop testing the same thing. Arguably we are past that point already but it is still of value at minimum categorizing the solutions.

about the lines of code

Well, LoC is one of the many things that show the effort took for writing some code. I don't mean that this is the best solution, but at least gives some context. For example, if there was some LoC number, I could see that the raw wasm-bindgen implementation is too verbose.

Maybe another way is to dd a hover effect to show the code for a certain framework.

about jQuery

That's my point. There is no reference for its performance, and because of that, people everywhere say that "jQuery is not good since it is slow!".

about the too specific optimizations

These are hard to detect, and it gets into the gray area. In my opinion, the programming style or approach is not a good way to detect this. Instead, if some implementation explicitly goes from the official API to Vanilla Js API in the human-written code, then this is a good sign for detecting this.

On the other hand, there may be some frameworks that have made these optimizations their "official API". For example, they say do this if you want to achieve this functionality. I think we should allow these types of implementations as long as they are considered "first-class API".

This is not hard to imagine if finally, WebAssembly gets direct access to Web IDL APIs. This means wasm based frameworks can beat JavaScript. If this happens, we should start porting our JavaScript libraries to Wasm. Using AssemblyScript, this will not be that hard! Maybe we get a "solid-wasm" someday!

a jQuery impl would be pretty much identical to vanilla, both in perf and imperative nature.

jQuery is "slow" because people tend to over-use selectors in loops (and selectors were in software/Sizzle, not querySelectorAll). jquery code frequently manipulates attrs rather than properties, and use things like innerHTML and string parsing for creating DOM elements via e.g. $("<div>").

if you avoid using these jQuery facilities, you just end up with a super thin sugar layer over vanilla dom, and it all becomes the same...and pointless.

btw, https://github.com/franciscop/umbrella is a modern/tiny jQuery alternative, with a matching API.

That's why I am interested to see (one or more) implementations here. When there is no point of comparison, we can't compare them.

Here it points to different jQuery like libraries. It would be nice to implement these compare them against each other.
https://github.com/franciscop/umbrella#alternatives

We can also have a jQuery 4 implementation which is interesting to look at.

@aminya
I love the idea about "lines of code" for framework benchmark implementation. I begin to imagine if there also a preview of the code on how the framework implement the test cases, instead of diving thru test's source code.

Well jQuery is a library..
For me it's different with a framework that have template system, two-way data binding, SSR, and architecture like MVW, MVVM, or MVC. While framework give many facility for the developer to build a whole application, library give facility to help developer with a specific part. Some of the framework need to compile the app's source code separately and some of them can compile it on the fly, while library just like plug and play. Usually framework that bring the compiler on the fly will be fatter in size and need more time to boot up on the browser. Library can depend with other library and developer can use it for his app just like adding one line of code. But when a library depend on a framework, developer may can't use that library on other framework.

Some people may don't know how to benchmark JavaScript, but actually if you want to compare library performance you can easily use jsbench.me to do that. Based on your mentioned link I created a simple benchmark for benchmarking jQuery and it's friends. I included ScarletsFrame on the test as a sample framework because it has jQuery like feature.


Why jQuery is not included? it's because the main goal of this project is about framework benchmark.
Let's try about the framework template system and how we can compare it with jQuery.
Some random Framework:

<!-- The framework may need to compile/parse this template on the fly -->
<div id="content">
    <div @for="val in list">dummy {{ val.text }}</div>
</div>
// Depend on framework design, the implementation or the available feature
// may be different with other framework
var content = new Framework('#content', {
    list: [{text: "Henlo"}]
});

// Data-driven framework may have feature that doesn't
// need you to update the DOM manually

// Editing first array value
content.list[0].text = "Hello"; // Framework may already bind this property and update the DOM

// Add new data into the array
content.list.push({text: "World!"}); // Framework may immediately add new element into the DOM

With jQuery: example result

<div id="content"></div>
var content = $('#content');
var list = [{text: "Henlo"}];

// We need to manually update the DOM to match the list
for(var i=0; i<list.length; i++){
    // Bottleneck: jQuery need to parse the string to generate new element
    content.append(`<div>dummy ${ list[i].text }</div>`);
}

// Editing first array value
list[0].text = "Hello"; // We need to manually update the DOM after change this
content.children().eq(0).text(`dummy ${ list[0].text }`);

// Add new data into the array
list.push({text: "World"}); // Again.. we need manually update the DOM
content.append(`<div>dummy ${ list[1].text }</div>`);

For me not including jQuery on the test is acceptable, because the level of it's implementation was different as a framework. It's not like I hate jQuery, I love with their simple syntax for some specific case. However framework is more optimized to handle multiple case in elegant way.

Maybe we get a "solid-wasm" someday!

Calm down bruh.. Don't lit the war, or I will join the forces..

To be honest: I'm not too excited maintaining a few jquery based (or similar) frameworks.
I think the biggest issue is that it's hardly possible to define rules for a programming style such that it doesn't converge to vanillajs.

Based on your mentioned link I created a simple benchmark for benchmarking jQuery and it's friends. I included ScarletsFrame on the test as a sample framework because it has jQuery like feature.

These types of benchmarks are far from real-world operations (who wants to calculate the length!). In contrast, the benchmarks in this repository resemble the real-world situations quite well.

@aminya It's true that the benchmark I provided is about counting the available element that have dummy class inside of 2 div element. But do you think you will never want to count the available elements that match with the selector's rule?

The test code on JSBench is editable, you can just modify it into .text('my test') instead of .length.
It's also true that these type of benchmark is the small part of the real world operation, but it's doesn't mean these small part will never affect the bigger parts! -- who wants to calculate the length?

A library does provide some collection of the small parts or tools which help developer build their own design or architecture. But with a framework these small parts is managed by the framework itself to provide the medium parts so the developer can immediately design their app without the need of managing the DOM directly. The benchmark on this repository is about to compare performance on how the framework handle these small or native function of JavaScript with some given situation. We may need to manage or accessing the DOM directly if using jQuery just like VanillaJS does. Instead of that we may need to use query selector more often or parsing text for creating an element to avoid it being very dependent with native DOM functions on the benchmark's source code.

@krausest Do you agree that the WASM libraries probably should be marked as reference build due to this issue? I think there is more grey area elsewhere but like wasm-bindgen and like stdweb are 100% direct dom manipulating reference builds.

Due to some older libraries being removed due to inactivity I think that the number of implementations using imperative methods for select row has reduced. I'm onboard tightening up the definition of data-driven a bit here although I suspect at this point there might be enough libraries in the greyer area that it could almost fill its own chart. Ignoring explicit/implicit event delegation which I think is hard to argue because it can be the "the way" to do stuff in some libraries.

I've tried my best to look at every implementation(except reflex-dom) and categorize it fairly. While I like to think I understand how every library works I can make mistakes so bear with me.

What I'm calling dirty model is less obvious a problem as it works without direct DOM manipulation. It just a bit localized since it means if this was say a global store you'd carry over the selected state to every different view. It goes against the universality of the solution. But on the otherhand isn't unreasonable, although it sort of turns select into another partial update test.

Also the difference between say the end user assigning (calling a setter) to an element directly in these cases or a proxy is a mute point. You may have internalized the mechanism but the end user still needs this imperative break out. It still syntactually is a per line mutation to the underlying data model instead of a hoisted selected state lookup. Some libraries have the ability to have the store still only have the single selected row but pass escape hatches to the user in the update cycle to write directly (Solid used to do this). This is all the same thing as it gets away from declarative binding syntax in your template.

For this reason I'm more comfortable categorizing a library like attodom or redom so. It's hyperscript returns dom elements so there is no abstraction there. But setting the class name assignment not in the HyperScript but in a side mutation is clearly imperative escape hatch. Sticking to this criteria of nested selection assignment in user land I can fairly easily categorize all the implementations. But I could see how that is idiomatic attodom/redom.

Clean

solid (after #794)
domc
solid-state (after #794)
ko-jsx (after #794)
vuerx-jsx (after #794)
mobx-jsx (after #794)
ivi
petit-dom
inferno
dominator
domvm
uhtml
hyperapp
imba
endorphin
lighterhtml
elm
hyperhtml
svelte
bobril
dyo
lit-element
resonatejs
aurelia
dojo
reaml-preact
angular-optimized
crank
hullo
datum
riot
isotope
vue-next
preact
neverland
heresy
binding-scala
misojs
mithril
angular-noopzone
vue
nervjs
react-redux-hooks
yew
vidom
react-hooks
reaml-react
glasgow
angular
react-tracked
vue2-composition-api
helix
angular-ng
react-easy-state
marko
ganic
angularjs
maquette
react
knockout
react-redux
rax
ractive
react-mobX
reason-react
reagent
miso
choo
blazor-wasm

Dirty Model (put selected on each row data)

mikado
sinuous
scarletsframe
sifrr
crui
san
mimbl
lit-html
doz
ember
glimmer

Direct DOM Manipulation

vanillajs1
vanillajs
stage0
fidan
vanillajs-wc
wasm-bindgen
fullweb-helpers
fullweb-template
domdiff
stdweb
attodom
redom
marionette
fntags
apprun
marionette-jquery


Reflections:
On the positive there are less non-reference direct dom libraries than expecting. Some libraries clearly depend on using vanilla js to operate or like marionette is just orchestrating whatever you put in. But there are libraries like apprun that look completely declarative and then sneak direct DOM updates right on the last couple lines, and Fidan is of similar nature. Attodom, redom, and fntags are all very similar in that they generate the DOM via templates or HyperScript but then use manual DOM manipulation to bind expressions and perform updates.

Some popular libraries are dirtying the model: lit-html, ember. lit-html is especially interesting as the lit-element does not. Given the lit-element version doesn't do WC per row I imagine the majority of the overhead between these 2 approaches might be something like this.

The vast majority of libraries using direct dom and dirty models are in the top 30 but not all of them. The top 30 looks very different if this is corrected.

Counter-proposal: instead of trying to define the impossible (and inflammatory), like what's a hack / dirty / cheating etc, solve the problem from the other side by reporting results to only 1 significant digit in the results app. Aka all these frameworks are at 1.0, these at 1.1, these at 1.2 etc. No 1.06, 1.07 etc, which is meaningless precision anyway.

Randomize the sort order of results that are equal to that precision.

So from the m85 results, vanilla, mikado, solid, stage0, sinuous, fidan and domc are all in the 1.0 club and a random one will happen to be "first" each time the page reloads. Congrats guys, time to focus on something other than the last 1% or, even worse, the hope that you get a "lucky run" and show up at the top of the next results page, or even worse than that, whether another dev's 1% improvement was cheating or not.

For framework developers, have an internal flag in the results app to switch it into "meaningless mode" where 3 significant digits are used instead of 1. At least, that's how I used to run it when I was comparing two approaches in Surplus.

I think this would maximize the really awesome parts of this benchmark -- devs using it to improve their frameworks and cross-pollinate ideas -- while minimizing the really sucky parts -- devs trying to stick stink tags on their competitors.

@adamhaile Your suggestion seems out of the scope of this issue. You should probably create another issue for that, so it is not lost here.

@ryansolid I think that the categorization of the libraries is certainly a good approach. In combination with the hover effect for showing the source code (and probably the number of lines of code), we can quickly get an idea of what the library's code would be like.

@adamhaile You're certainly right about the precision and I really considered reducing precision but wouldn't the necessary rounding make the ranking quite unstable? (like dominator 1.24 => 1.2, domvm 1.25 => 1.3. I guess that won't be as stable as we'd want it to be). But if I find the time I'll try to make the compare mode easier to use. I think this could help.)

@ryansolid
As a first step I marked the implementations with direct dom manipulation.

Regarding dirty model: I currently think this way to model the row state might be considered okay. If I remember right this was good practice for some frameworks (was it ember or even jsf?) long ago when it was hard to call functions in your templates.

But I think we should mark the following implementations with issue 772 for reaching out to parent nodes in client code:
sifrr
sinuous
san
lit-html

Would you agree?

I feel like I didn't quite explain my issue with the dirty model. In one sense you could say it tests something different. Per row selection is a lot like update every 10th row. That being said everyone wants it to be an O(2) operation instead of O(n) and what I'm doing internally in Solid isn't that different. I'm iterating over all rows but only doing heavier computations on the ones affected.

No, It's more that it changes how you write your implementation in a way that could be ambiguous. Like what's the difference between:

el.className = "danger"

row.class = "danger"

If it's a proxy those could be the exact same thing. They use a technique that doesn't proxy to a change detection system but actually proxies directly to one to one element writes. The data model can't be re-used and only ties to those exact DOM nodes. Those libraries already have syntax to perform the operation in the template the other way but sort of sidestep it to do it that way. Which is arguably fine but it's ambiguous. Once you change the selection to be an implementation that is about setting individual rows it opens this up.

Take this further once you mark the row you can start caching the previous selected row. It is essentially the same as holding the TR like the vanillajs implementation. To be fair we can't really crack down on this nor is it a necessarily bad technique beyond it essentially doesn't scale beyond local template. But it works fine here. Not all libraries that put selected on the model do this it just the change to how you write the implementation opens up the door.


Event delegation personally isn't an issue to me. But others would disagree. Mostly that it is defacto technique for this. Libraries that don't have it built-in tell you it's an idiomatic way to solve these sort of problems in their docs. It's been used universally since it could used and almost every library either does it automatically or would have their implementors do it. Implicit event delegation like you see in many frameworks often have problems with DOM standards like Shadow DOM re-targeting and knowledge of Composed Events. Almost every VDOM library here would fail that. But a library like lit-html used with Web Components all the time wouldn't add event delegation in the core for that reason. Yet they aren't supposed to use the technique where it makes sense while everyone else can? Unlike what we've talked about above it isn't avoiding the libraries built-in syntax, it's just the way to solve the problem.

All that being said it isn't part of the library and takes implementation specific code so it definitely doesn't look as nice. I'm going to have to defer any perspective on this to others. But I know the Web Component crowd would probably say disallowing it doesn't make sense.


I guess what I'm saying is in my opinion this sufficient for now and we see how things go. I do think both of these might warrant further examination or refinement but I can't completely fault implementations from doing so.

Mind you is it only those 4 libraries doing explicit event delegation? I would have expected more.

Another way to look at the Dirty Model thing is pretty much any library could implement this way and it improves things to O(2) from O(n). Which means it's an easy win for every library here in the first group. Should we just go ahead and do that? It has a definite performance improvement. It's not quite as performant of direct DOM manipulation but depending on type of library it can be closer to that than it is to doing it the way most libraries are.

It isn't not data-driven. It just changes the test from a delegated state change to a partial update. This especially benefits granular reactive libraries (like mine or Sinuous).

Anyone have opinions? I thought the unwritten rule to mirror real scenarios you wouldn't want to dirty the model. But if this is just a presumption, it's an easy to fix.

@leeoniya @luwes

I took a look at what impact it has for react and angular. One model is immutable, the other is mutable.
Both benefit from the dirty model

You can review the code im #799
I'm currently not sure what's my take on it. This benchmark never forced a programming style for the model (for which it has been criticized), but I really think that each framework has its own best practices for organizing its model and it should be allowed to use ist. React favours immutability, vue.js (though I believe it sometimes bemoans itself for it) and svelte's reactivity model favor mutation.
I think the code for the hack isn't actually much worse (though keeping the class name in the row state might be considered much worse than keeping a selected boolean). In a real world app I'd actuallly consider implementing the hack when it solves a performance bottleneck.
Your opinions?

How about adding a new challenge on the templates to make sure the framework doesn't proxy the data directly or bind only for an element/attribute?

I simplify the template example a bit

From old challenge:

<tr class="{{ selected ? 'danger' : '' }}">
  <a class="lbl">{{ label }}</a>
</tr>

Into new challenge:

<tr class="{{ selected ? 'danger' : '' }} id-{{ id }}">
  <a class="lbl">
    {{ label }} (selected: {{ selected }})
  </a>
</tr>

Example of result:

// Model on JavaScript
list = [{id:1, selected: true, label: "beautiful apple"}];
<!-- HTML Result -->
<tr class="danger id-1">
  <a class="lbl">beautiful apple (selected: true)</a>
</tr>

I think some framework will be able to do the new challenge. But if some framework only applied the last changed model data's value as the element/attribute's whole value, then it will be easy to know if the framework just proxy the model value directly into el.textContent or attr.value with getter/setter.

To be honest I'm still not sure about the "Dirty Model" @ryansolid have mentioned, actually ScarletsFrame can reuse the array data with other template element like this example. If every container that bind with the model was removed from the DOM tree, ScarletsFrame was designed to automatically clean up and turn every model data into regular JS Object (removing getter/setter). Using getter/setter on the model is part of the ScarletsFrame's feature to reduce the complexity for updating the model's data with the binded template's state, and we can listen the binded value with an callback like this example. The example I provided is just my simple scenario. Actually there are more feature or scenario that can be done with only little effort because of "Dirty Model".

yeah, the dirty model will boost a lot of libs that currently try to keep it clean.

@ryansolid thanks for putting together the thorough review in #772 (comment). i'm a fan of simply adding visible feature flags to the table according to that categorization (maybe even with manual event deleg?). people should be able to make their own judgement calls as to whether the lib & impl speed comes at an acceptable purity/imperativeness trade-off for their coding style or situation.

i think the rows should breifly describe the flag (as the metrics do) rather than only linking the relevant GH issue.

@StefansArya The reason I'm making the distinction with proxies is not just that it is essentially direct manipulation since I mean the goal of most reactive systems is to achieve that. I would say direct binding of that nature to the DOM element like a couple implementations (not ScarletsFrame necessarily) makes it not transcend. But an event-based granular system is basically the same with a subscription mechanism on top. It's that and it's testing a different thing. I was noting that this implementation basically opens the door for abuse specifically since it's easy to cache that one row. Whereas not putting it on the model doesn't.

The reason the dirty model is awkward goes beyond that it can't be shared (although in the couple implementations it cannot be), it's that the selected state arguably a UI state is global. Sometimes maybe that is desirable like you showed in your example different views of the same thing. But I mean more like you have the same list multiple times being selected temporarily (ie not something you'd store in a DB) for different reasons. Do we add 3 selected states here? What if you go somewhere else in the app should it still be selected? Does selected even make sense to be on the model?


@krausest I definitely would and have implemented that in real apps where it was causing performance overhead (KnockoutJS with Web Components at times a bit of a mess). The challenge is I and I think many framework writers take that test to exist in this suite because it's trying to test a different case. Putting it on the row is basically the same as partial update every 10th row. I would go as far as if they thought it was ok to put selected on each row the DOM hack might have not even existed.

The problem is if posed with this problem in isolation as a benchmark wouldn't you just do the fastest method? I think people have mostly been pretty respectful of the data setup part even if it means wrapping with their own primitives. Some of the dirty model methods don't even put the selected in the initial creation not make that part of the benchmark different.

But you confirmed what I suspected. It is in every libraries best interest to change their implementation this way. It just tests a different thing that we are already testing. I think this comes down to which goal of the project is more important. From the perspective of common cases solved by every library I think it's fine to dirty the model.

But from the value of the framework author to solve certain classes of problems it is a shame. Ie.. I used the selection test to achieve a different solution with those constraints. I haven't tested recently but dirtying the model might actually be faster than what I came up with. But I acknowledge this class of problem might not be real and this constraint might be imagined. I don't believe it is from my experience but I can't necessarily convince someone else that. I just assumed the inclusion in the test was because it was a different challenge for the library not just because it's a common thing someone would do. Most tests here test a unique thing.

I see. I created #800 for the dirty model and will flag the implementations above. And #801 is for explicit event delegation.

@krausest I guess we will need different visualizations at some point or we are going to have a lot of red. From the note on dirty model it is clear it isn't considered a bad thing. Is it strange I'm waiting to see what the visual looks like before I go and convert a bunch of rows over to selected on the model rows? Saying this is ok makes it something I feel compelled to do to get the best numbers.

Truthfully if this is ok, I'm not sure it is even a thing anymore. People would just write their tests differently from the start. There is no real reason not to do it the more performant way.

i'm in the camp of keeping the models clean since i usually consider the models to be data and selected to be transient ui state. i'm not sure the statement of it being "good practice" is warranted as a general recommendation. i think it's sufficient to note which implementations do this but nothing beyond that. i don't plan to update domvm do use a dirty model just to get better numbers. @ryansolid i find the fast & clean model implementation far more impressive than getting even lower numbers in your case. if a dirty model removes some other bench-specific hacks, then maybe it's worth it, but not just to boost perf, imo.

@adamhaile You're certainly right about the precision and I really considered reducing precision but wouldn't the necessary rounding make the ranking quite unstable? (like dominator 1.24 => 1.2, domvm 1.25 => 1.3. I guess that won't be as stable as we'd want it to be). But if I find the time I'll try to make the compare mode easier to use. I think this could help.)

Since your range has a floor at 1.00, truncation would be better than rounding as it gives you equal-sized buckets. Aka 1.0000 to 1.0999... all go to 1.0, 1.1000 to 1.1999... go to 1.1, etc.

Stability would be much better. For instance, Surplus was listed as sixth in the m83 rankings but first in the m84 ones. That's made-up precision: in truth, all you can say from the data is that several frameworks tied for first. Truncating before ranking would do that. Some frameworks that regularly score near an edge might flip between buckets, but that's much less churn than there is at present, where all the frameworks jump around depending on how close they are to their neighbors.

None of this is meant as any kind of slam on your benchmark. Not at all! One of the things that makes this benchmark awesome is all the work you've done to remove as much variability as possible. It's the best tool out there, it just unfortunately doesn't have the power to do what it's claiming, which is provide a framework-by-framework ranking. That's without even getting into the question of whether precision = accuracy, aka whether this benchmark is saying something "true" about the frameworks. That latter is why I'd argue for a fairly conservative, single-digit of precision in the rankings.

A preview for the new result table can be seen here:
https://krausest.github.io/js-framework-benchmark/2020/new_result_table.html

It features a few changes:

  • A (hopefully) simple to use comparison mode
  • It defaults to showing the median
  • It filters all frameworks that have errors (html structure, non-keyed swap for a keyed implementation), severe cheats (#772 not data driven) and any cheat (all others notes) except the reference implementations

I think these are all good improvements. Or at least give a more directed view to what is going on. I think default is good too of hiding errored. I think eventually we could move to hide severe cheats by default but having the filter goes a long way. Using Median definitely plays to the more consistent looking results.

You all are missing a key idea behind all the frameworks, and skewing the results for yourself and everyone else. The "rules" for how the rows are created/destroyed/added/updated and swaped should not be dictated by your "vanilla js is a good guidepost" mentality or key vs non. It's all silly imho.

Instead, you missed an important ingredient that is skewing all the results and producing false positives. The missing ingredient is the "framework coding style". Here's what I mean. In a real world React app, you'd code a different way, a "Real world" practical way according to the rules and design hooks of the framework vs. writing benchmark winning code -- "cheating" with some vanillajs, and bending the rules and design of said framework.

So you cannot validate the rules of benchmarks by being vanillajs or not, you need physical human eyes to see when someone writes benchmarks that is not in the mission statement and advertisement of the frameworks design. So all of these results are not right then.

I'm not sure it's ignored. It's that in my opinion there are 2 things that need to be protected:

  1. Idiomatic implementation of the library
  2. Value of the comparison

The first is definitely harder to judge or evaluate which why it is hard to make a much of a statement about what I refer to as the "dirty" model. But talking about (2) for a minute Keyed/Non-Keyed is severe implementation detail that changes functionally what is happening. I could say that my library is idiomatically Non-Keyed, but I could also say that using a <table> is an anti-pattern for Framework S and you should use <ul>. At a certain point there needs to be a functional constraint to define the expected behavior of the benchmark for comparison.

Output HTML is good one (although hard for a real Web Component case). Keyed/Non-Keyed is more than reasonable here for the same reason. That is not just an implementation detail it changes what the benchmark is doing. It affects the output DOM nodes.

Other considerations like Event Delegation or Dirtying the model also change what the test is doing. Which is a little concerning. However, we don't ban the event delegation because many libraries do it internally. You could say that the explicitness goes against (1) but many libraries doing this in the past actually recommend using explicit event delegation in their docs. Dirty model is something the implementation chooses to do and could be considered idiomatic, but it changes the test from an intentional O(n) operation to an O(2) operation. There is already a test case for the latter. Again we probably aren't going to crack down on that even though in one perspective it goes against what is being tested.

Ultimately it comes down to say in benchmark 4 is what is being tested is:

  1. How a library performs transferring a single selected state in a list.
  2. How a library performs doing a delegated state change that potentially affects O(n) rows.

I mean there are a few definitions in between. How about:
How a library performs a delegated UI state(non-model) update against all the rows in the list.

I think it could be possible to state the intent of the first 9 tests in a way that clearly indicates what they should be testing.

Back on (1):
The React examples in particular are pretty idiomatic, when I wrote the Hooks one I had even feedback from Dan Abramov himself in how to best represent them. We have a lot of the framework authors themselves involved or writing these tests. And even then they are willing to be a bit loose on what's idiomatic. Afterall we build these things to be flexible enough to solve the problem. And when the framework authors write code in their library it might not look like everyone else. Ever read a twitter thread where the React Core team tries to explain to someone how they should be using hooks and it's pretty apparent.

Hacking in VanillaJS actually doesn't change what is functionally being tested it just removes the value of comparing different solutions and is likely at odds with (1). But the criteria for what is at odds is pretty difficult. Stage0 is supposed to work that way. It's fine but also leaves less to compare. It's definitely arbitrary when someone can just wrap a DOM element in a proxy more or less and be doing direct updates. Unfortunately for this I don't see what we can do other than call out really obvious imperative escape hatches but it is arbitrary to a degree if it is considered idiomatic.

Based on your mentioned link I created a simple benchmark for benchmarking jQuery and it's friends. I included ScarletsFrame on the test as a sample framework because it has jQuery like feature.

These types of benchmarks are far from real-world operations (who wants to calculate the length!). In contrast, the benchmarks in this repository resemble the real-world situations quite well.

makes no sense

@krausest I'm not thrilled with this tagging of Mn. The implementation certainly is data-driven. Certainly willing to make modifications to adjust it if necessary, but I'm having a hard time understanding why direct dom manipulation is a problem if the solution is also data-driven. I'll grant that both mn implementations are fairly extreme, and though any of the tactics we took are certainly used somewhere, I don't know that every perf consideration would be considered standard fare... What I do know is that for very critical views within an app, any and all of these tactics could be used and that, to me, is precisely what we've done in this benchmark.

I assume the issue comes primarily down to this line: https://github.com/krausest/js-framework-benchmark/blob/master/frameworks/keyed/marionette-jquery/src/Main.js

where instead of re-rendering the template, we're just replacing the text node.

Most implementations of marionette would just view.render() here and that would certainly work, but with Marionette because it's basically just javascript you can do a quick direct modification, bypassing an entire template render if you're trying to eek out performance, which seems completely legitimate to me.

Just because some libraries have to avoid direct dom modification because their implementation requires doing it virtually does not mean that simple solutions should be discounted.

@paulfalgout I understand your point and I hesitated to mark those implementations for a long time. But what we want to measure is the performance of a framework, not the performance of a mixture between vanillajs and the framework (because we already know how fast vanillajs is. Though it's another interesting question whether frameworks make that mix easy). I understand that this argument is harder to accept for frameworks that embrace direct DOM access. Setting e.g. the class for the selected row directly gets the job done but can't be compared to a pure data driven implementations (like react) so I think it's fair to mark the former implementation and leave it up to the contributor to decide whether he prefers a "cleaner" or a faster implementation.

I suppose I understand the argument from the purely data-driven perspective, but "direct dom manipulation" is not a good indication of that. In backbone for instance, view's render into the view's el. If the view's el needs an added class, then adding/removing that class directly is the only way to make it work. But doing that could certainly be off of a data event and unrelated to user input. And it seems entirely arbitrary to require that libraries must make all DOM modifications abstractions or be flagged unclean.

@paulfalgout Do you have a suggestion how to handle that situation better?

In the spirit of the benchmark it should be our goal to flag e.g. a react implementation that sets the class name of a selected row in an event handler directly on a dom element. It breaks with react's abstraction and reports the performance of a vanillajs implementation.

I only came up with the following idea (but I'm not sure whether it's actually a big difference to the current approach):

Would a visible categorization help? This wouldn't give the impression that only (fully) data driven framework are correct in some sense.

A fully data driven implementation would have to use only the data to perform all rendering and updates and the selection state must be kept per table, no state may be kept in the dom. Reaching out to the dom would be marked with an issue.
Not fully data driven implementations are allowed to use dom operations (and won't get an issue marker if they do). A drawback for this category is that the performance statement is rather weak. A good performance in a specific benchmark doesn't imply the framework is fast since it could be that it's just due to a fast direct dom manipulation.

This brings me to the point that it might not make a big difference: Maybe it's just adding the word "fully" in front of data-driven in this issue and adding some text that this could be intentional by the design of the framework?

It seems better for data driven to be an accomplishment, than a red flag for not doing it.

But I think my main point is that direct dom manipulation alone is not an indication of not being data driven?

Really for this metric in this benchmark it's that we want to make sure that the event handler isn't doing the work. What if there was a window.selectFirst and window.deleteFirst (or something like that) which is expected to only do one action to update the data and not the DOM? I'm not sure what would work for all implementations, but it seems like there should be an easier way to enforce/encourage data driven via something measurable.

I guess data-driven isn't really the only term we are looking for. Declarative updates maybe is more accurate?

All change is going to be propagated from the event regardless of the library despite how many layers of indirection we go through. At one end there is click event dispatched and at the other end there is el.className = "danger". If all data-driven means is we maintain the state ourselves (and that is the source of truth) then at minimum all an implementation needs to do is both store the state, and only write to but not read from the DOM. When you control both ends or even the whole tunnel in the end user implementation the data driven-ness is arbitrary.

This is different from libraries where unless you opt into imperative escape hatches you only get to load in from the front and all data propagation is handled internally. It's easy to call those data driven since there is no other way for them to function as all that is exposed to the end user is data. You pass in data and the DOM output is a blackbox. There is almost always a way to opt out of that behavior but it is less interesting since then all implementations more or a less become the same thing.

But not all libraries are equal here so while I might argue that testing the declarative abstraction is the most interesting thing in this benchmark, it's hard to penalize libraries that don't have those features, but it's worth acknowledging when it isn't held to the same constraints. Direct DOM manipulation could be argued as Data-Driven perhaps but definitely not declarative.

Examples of Imperative Escape Hatches:
useEffect
ref
autorun
subscribe
on_____

And that's the thing events are imperative escape hatches (unless you wire them like RxJS, turning them into the data immediately). Where to draw that line is a lot more arbitrary, except arguably actually writing the DOM (finishing the loop). All discussions of what is a proper implementation comes down to what happens in the imperative part.

I don't know what to do with this per se. But the crux of this whole thing seems to lie around this. We've been throwing data-driven around but I think we may have actually meant declarative.

commented

Querying or setting a virtual Node (or any other intermediate view-model abstraction) is not any more "data driven" that using the DOM native model.

As such, the only true "data driven" way would be a purely functional view data =>freshAbstraction

The declarative/imperative argument to me is mostly ideological.

In this case, Backbone does not have a way to imperatively update a view's root element after construction. This is due primarily to the libraries event delegation choices. For those that can't stand the .addClass solution, the suggestion is to allow for an extra wrapping tag, but that is impossible for this table benchmark.

Regardless I'd still argue that this isn't something that needs to be flagged for. There is a cost to the abstractions. I could argue that if we're flagging implementations that aren't purely imperative, then we should also be flagging how many depths of abstraction the developer is away from just Javascript. We should be concerned that the only developers that will know how Javascript actually works are maintainers of the jsx library. That's not a good argument, but I find it to be ideologically similar to the imperative requirement suggested.

Similarly there's some cost to keeping things immutable. Should we flag for mutability because it isn't "clean" and allowing mutability penalizes libraries that enforce it?

If a developer is choosing a library simply by looking at the benchmark without considering the implementation, that's on them. Honestly I don't really think this benchmark has much value outside of library authors.. I mean it's cool and all, but the comparison itself isn't super useful because the methodology between the libraries can be so different. Utilizing this benchmark led Marionette to making substantial non-breaking changes to portions of our code to improve the performance that had no negative effect on users. The improvement was made within our abstractions. That's what this benchmark is useful for.. competing against previous versions of your own library. Improving your benchmark score by updating your library.

All this other flagging is just shaming other ideologies or library authors that have purposefully skirted getting any value out of the benchmark. That doesn't make the benchmark better.

The methodology can be different yet interestingly sort of all really similar. More than one would think. As a library writer I do find it incredibly helpful to see improvements/regressions in my own libraries but I do think there is value in the comparison. I'm constantly comparing against other libraries as I work on new libraries and implementations for existing libraries.

And valuing the comparison, the abstraction is what you are testing. There is a reason there are so many libraries sitting under clean when they could all fall into the other categories for a performance gain. The filters help a lot from that perspective. I'm convinced we can find a happier place.

It is no secret @krausest and others have made it clear he had no particular interest in adding jQuery to this benchmark. And I can see an argument for discouraging such implementations in a clear way. However, like it or not this benchmark has grown into a much bigger thing than the original VDOM comparison it was 4 years ago.

I think one option would be to categorize without being so in your face negative. Instead of referring to things as cheats which implies intent or mistakes and blasting red everywhere. Maybe it's a matter of just changing the language a bit. Don't get me wrong any categorization of a minority group is going to carry a stigma but at least we wouldn't put a red splot against the libraries name. Maybe a bold link is sufficient. Unless there are errors, then it's deserved.

As I see it these are categorizations we have right now:

  1. error/cheats - This is mostly historical as we found the issues after and there is a reasonable position to remove some of these. That being said we've had libraries say they can't function that way. Technically I think breaking the keyed constraint is pretty much a non-starter here.

  2. manual DOM manipulation - These implementations use direct DOM modification in the end user code. This means specific DOM updating code is written tailored for these specific tests.

  3. manual event delegation - Event delegation done in the implementation rather than the library. This means in the open tracing of the DOM tree off the event object. Library level explicit event delegation(using a helper or binding level syntax) like domc, solid, or inferno aren't in this category and could be open to another category if desired. This to me is an implementation detail and doesn't change the nature of any of the tests.

  4. view state on the model - These implementations move the selected state on to the each view item. While again a perfectly fine thing to do it changes the nature of the select rows test. Every library that does this could do it the other way if it weren't for the fact that it's a performance boost. Every library could benefit from it, so it definitely deserves categorization. Personally I think this is not following the specs of the test in the same way as keyed/non-keyed but categorization would be sufficient.

note on manual scheduling - These are implementations that have things like requestAnimation frame baked into the end user code. For the most part libraries that cheat with this method have removed it and the only library I know in this category right now says that it is idiomatic to schedule updates in their library. If it was a blatant cheat(like only wrapping clear rows) I'd consider this an error/cheat. I'm not sure this deserves a category of its own.

So I'm proposing we remove the red backdrop on non-error/cheats. And update the filters to something like:

Show Implementations with: None All

  • Error/Cheats
  • Manual DOM manipulation
  • Manual event delegation
  • View state on the model

I'm used to hitting those none toggles so no different to me.

@paulfalgout @krausest is this closer to a middle ground? I strongly believe the categorization is useful. But I can't make the call on how disapproving we should be of manual DOM manipulation as I've walked on the line of that one myself a lot.

Granted it's hard to enforce exactly, but I do think it's reasonable to mark anything previously described as an error or cheat.

Otherwise I wonder if there's a way to flip the other designations such that they're positive? Like "clean model" or if that is too polarizing "Separate view state" or something like that. Such that in some sense we're incentivizing certain setups.

@paulfalgout I updated the results table similar to @ryansolid 's suggestion. I couldn't pick up your idea with the opposite naming yet, but I'm looking into it.

@paulfalgout @ryansolid
This could be the positive wording for the selection. Of course I'm open for better wording.

@ryansolid One major drawback: You'd have to get used to click "All" instead 😃

Do you like that better? I'm not really sure if I prefer it, since I'll keep the issues the way they are and this opposite wording makes it IMO harder to get the selection and the issue together.

commented

I think it's important to remember that the purpose of the benchmark is not for the sake of the frameworks themself (or the person writing the benchmark)... it's for the sake of users who want to evaluate the performance of different frameworks.

So it's pointless to have a React implementation which does a lot of direct DOM manipulation, because that's not idiomatic in React, and so a user would get a false impression: they will think React is really fast, but then when they write idiomatic React code in their own app it ends up being far slower, since their app isn't doing direct DOM manipulation.

So it's not really about direct DOM manipulation (or "data driven", or whatever), it's purely about idiomatic usage of the framework. And what is considered idiomatic varies from framework to framework (and is somewhat subjective), so that's really hard to do without causing a lot of drama.

So what I would say is to have two categories: idiomatic and fastest. Fastest can use basically any trick it wants (while staying within the framework), whereas idiomatic is supposed to be extremely uncontroversial standard code, without any performance tricks at all. The idiomatic code should be extremely simple and clean, the sort of code you would see in a tutorial for the framework.

It's okay for the idiomatic benchmark to use direct DOM manipulation or event delegation, as long as that is considered idiomatic for that framework. And what is considered "idiomatic" should be conservative (not permissive). If there is doubt over a performance trick, then the performance trick shouldn't be used. If you want maximum permissiveness then you should create a "fastest" benchmark instead.

This provides useful information for the user: they can see what the typical performance of a framework is (when written in idiomatic style), and also see what the potential maximum performance is for a framework (when using escape hatches and kludges).

In real apps you end up needing both of those things, since real apps are first written in idiomatic style and then kludges are put in when extra speed is needed.