dprint / dprint

Pluggable and configurable code formatting platform written in Rust.

Home Page:https://dprint.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide a mechanism to allow a process plugin to declare it as single-threaded so dprint can scale it

bradzacher opened this issue · comments

Working with nodejs in a process plugin I've found that there's a lot of overhead introduced by the IPC. In a nutshell a format request requires 4 cloned IPC messages to complete - 2 of which contain a file's contents:

  1. dprint asks the plugin to format a file with text
  2. the plugin asks its worker thread to format the file with text
  3. the worker thread responds with either the formatted text or null
  4. the plugin responds with either the formatted text or null

This cloned IPC has a lot of overhead, sadly.
I've been running some tests on Canva's repo and I'm currently averaging ~10ms per file for an unchanged format - which isn't bad, but adds up across ~60k files.

In contrast, I've written a pure-nodejs script for the formatter which spawns 4 child_processes, sends each 1/4 of the entire list of files, and lets each process do its own read-format-write cycle and that runs at ~1ms per file.

I think that if I could remove the nodejs-based worker_thread parallelisation layer and instead have dprint just spawn n instances of the process plugin then things would be much faster and closer to that "reference" implementation perf-wise.

What do you think? Do you think that this feature might make sense - giving a plugin the ability to say "hey I'm single threaded - spin up as many of me as you please" and letting dprint do its own thing as it sees fit based on resource utilisation and availability.

@bradzacher yes, it should be faster for sure.

That said, what do you need a JavaScript-based formatter for? I've found it's way faster to use deno's Rust crates for anything JavaScript based similar to what dprint-plugin-prettier does (https://github.com/dprint/dprint-plugin-prettier). I'm planning to overhaul that plugin to be better soon (going to release a dprint roadmap sometime in the next few days that will include the major changes to that and also will try to prioritize some of this work).

Btw, sorry for not responding to your email. I was in Japan the past two weeks and wasn't really checking it (back now). I think most questions in it have been answered now. Also, thanks for the details on worker_threads! That was intresting.

We want to avoid introducing deno to the repo if we can. Mostly because if we introduce it then we have to maintain it and our team is already stretched kinda thin in regards to ownership and maintenance!

A process plugin seemed like a good middle-ground - a smallish piece of JS code that allows us to use the standard NodeJS version for our environment to improve the perf compared to dprint-plugin-exec. It certainly did improve it a lot - from about ~300ms per file down to ~10ms per file! But yeah still shy of what I'd really want at our (growing) scale.

I actually did want to use dprint-plugin-prettier! However we need to use our forked version of prettier (@canva/prettier) instead of the open source version - but the plugin currently has a hard-coded version bundled (AFAICT?).
The deno usage also put us off just a little as mentioned above - though that's something we could swallow and ignore as a black-box implementation detail.


Btw, sorry for not responding to your email

No problem at all 😃 You gave me more than enough to work with! With your examples I was able to build a fully-working JS process plugin! I can probably share it if you're interested.
The protocol side of things was simple to do - the hardest part has been stably reading the input stream. It took a lot of trial and error to get that working fully - turns out that reading binary data from stdin is a bitch to get right in NodeJS.

However we need to use our forked version of prettier (@canva/prettier) instead of the open source version - but the plugin currently has a hard-coded version bundled (AFAICT?).

Ah, ok makes sense! That said, there is a dprint-plugin-deno-base crate that makes it slighly eaiser to use for a scenario like that: https://github.com/dprint/dprint-plugin-prettier/blob/main/base/Cargo.toml (I extracted this out of dprint-plugin-prettier a few months ago... it is very undocumented though)

It uses the Deno crates, but it's not really Deno. It uses Deno's helper crate wrappers around v8 and then some of the other crates to get some APIs that are used in Prettier. I had to polyfill a bunch of stuff still though, which was a bit of a pain.

Yeah it requires a bundled version atm (I'm planning to support prettier plugins that are configurable soon).

With your examples I was able to build a fully-working JS process plugin! I can probably share it if you're interested.
The protocol side of things was simple to do - the hardest part has been stably reading the input stream. It took a lot of trial and error to get that working fully - turns out that reading binary data from stdin is a bitch to get right in NodeJS.

dprint-plugin-prettier used to work this way and use JS for the communication (and when I exprimented with https://github.com/dprint/node-plugin-base/). It was indeed surprisingly extremely annoying to read from stdin.