pkgxdev / pkgx

the last thing you’ll install

Home Page:https://pkgx.sh

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Programs installed by pkgx needs to be quicker

felipecrs opened this issue · comments

It would be great if the programs installed by pkgx were quicker. I use pkgx install in my devcontainers to ensure the same environment is used by both developers and CI/CD (Jenkins).

However, I'm noticing a very significant increase in the time it takes to execute the same scripts when using pkgx installed programs. One of my pipelines that usually takes 10 minutes started to take 15 minutes after switching to pkgx.

For a comparison:

> pkgx install node@20.12.1 && ~/.local/bin/node --version

# using pkgx's shim to execute node
> time ~/.local/bin/node --version
v20.12.1
~/.local/bin/node --version  0.37s user 0.30s system 128% cpu 0.519 total

# pkgx's node without any shim
> env +node@20.12.1

> time node --version
v20.12.1
node --version  0.00s user 0.00s system 59% cpu 0.006 total

> volta install node@20.12.1 && ~/.volta/bin/node --version

# using volta's shim to execute node
> time ~/.volta/bin/node --version
v20.12.1
~/.volta/bin/node --version  0.00s user 0.00s system 30% cpu 0.006 total

# volta's node without any shim
> time $(volta which node) --version
v20.12.1
$(volta which node) --version  0.00s user 0.00s system 11% cpu 0.004 total

As you can see, pkgx's shim seems to add 0.5-1s on every invocation. Which is reasonable for day to day use but is unnaceptable for some scripts and environments.

Another comparison is when using aqua-proxy (made with Go), which seems to add around 50ms to each invocation.

Volta's shim is made with Rust, and it is extremely fast. It even looks for a package.json in the current and up folders to find out the proper node version to use. And still, extremely fast.

Maybe shipping some kind of very optimized pkgx-shim binary made with Rust would be the best option in this case.

Hey @felipecrs,

Thank you for bringing this to our attention and for the detailed comparison! 🌟 Your insights are invaluable in helping us improve pkgx.

Based on your feedback, it's clear that the performance of pkgx's shim is not on par with your expectations, especially when compared to alternatives like Volta. The delay introduced by our shim, as you've rightly pointed out, can accumulate significantly in scripts and CI/CD environments, impacting overall efficiency.

We're taking your suggestion seriously and are considering the development of a more optimized pkgx-shim, potentially in Rust, to minimize the overhead. Rust's performance characteristics make it an excellent candidate for this kind of tool, as demonstrated by Volta's success.

In the meantime, here are a couple of workarounds that might help mitigate the issue:

  1. Direct Invocation: For critical scripts where performance is paramount, consider invoking the binary directly from its installed location, bypassing the shim. While this approach requires managing the path more explicitly, it can significantly reduce the overhead in time-sensitive operations.

  2. Batching Commands: When possible, batch operations to minimize the number of shim invocations. This approach can help reduce the cumulative delay in scenarios where multiple pkgx-managed tools are used in succession.

We appreciate your patience and understanding as we work on this. Your feedback is crucial in helping us make pkgx better for everyone. Please keep the suggestions coming, and let us know if there are any other ways we can assist!

Best,
Dosu

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

I think the current optimization would be to add them to your environment:

% pkgx hyperfine "pkgx node --version"
Benchmark 1: pkgx node --version
  Time (mean ± σ):     283.2 ms ±   6.2 ms    [User: 212.4 ms, System: 36.7 ms]
  Range (min … max):   276.8 ms … 297.9 ms    10 runs

% dev
env +nodejs.org
% pkgx hyperfine "node --version"
Benchmark 1: node --version
  Time (mean ± σ):      18.6 ms ±   0.5 ms    [User: 16.2 ms, System: 1.6 ms]
  Range (min … max):    17.8 ms …  20.7 ms    143 runs

the reason is that the dev environment builds the path with your tools, so you don't get those execution lags. if you're using GitHub actions, the dev action, rather than the setup action handles this. In your Jenkins environment, you should be able to do similarly by invoking dev to bring in all the tools defined for your environment.

I agree that the installer shims add too much overhead for those kinds of environments. You can also do the same thing for a script (making it a 1-time cost, instead of a per-call cost) by using a pkgx shebang.

My intention is to not have to refactor my scripts. They already execute in docker containers, which is supposed to provide them the programs in the environment.

And dev can't be used in containers.

> docker run --rm pkgxdev/pkgx dev
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "dev": executable file not found in $PATH: unknown.

Or any environment where shell is not used to invoke commands.

yes, that's a trickier one. max has a few tricks in https://docs.pkgx.sh/run-anywhere/docker, but possibly also something like

RUN pkgx +{list of tools} | sed 's/^/export /' >>.bashrc

could work. definitely a thorny problem.

That only works if bash is used to invoke programs, which is not always the case for my pipelines. :(

yes, it might take some trickery for other shells. if it's just path modifications that are needed, you could probably manipulate that manually as well. just brainstorming paths forward.

yes, it might take some trickery for other shells.

Especially if no shell is used (docker exec for example).

if it's just path modifications that are needed, you could probably manipulate that manually as well. just brainstorming paths forward.

I think this is impossible within a Dockerfile... You can't generate or change ENV instructions from commands. And will probably never be (moby/moby#29110).

But it is not only about changing PATH. Some pkgx programs has important environment variables like JAVA_HOME for openjdk.

yes, we try to avoid needing environment, but it's not perfectly possible.

of course you can always hard-code stuff, but it loses much of the interactive shell magic of pkgx:

RUN pkgx +llvm.org=17.0.1 && ln -s ~/.pkgx/llvm.org/v17.0.1/bin/* /usr/bin/

That's interesting. I built a script like like:

#!/usr/bin/env bash

set -euxo pipefail

pkgx_output=$(
  pkgx \
    +node@20 \
    +npm@10 \
    +deno@1.42 \
    +yq@4 \
    +k3d@5 \
    +helm@3 \
    +kubectl@1.25 \
    +tilt@0.33 \
    +helmfile@0.163
)
set +u
eval "${pkgx_output}"
set -u
unset pkgx_output

# create symbolic links for each pkgx binary in $HOME/.local/bin
# shellcheck disable=SC2016
echo "${PATH}" | tr ':' '\n' | grep "^${HOME}/.pkgx/" \
  | xargs -r -n1 bash -c 'ln -vfs "${1}"/* "${HOME}/.local/bin/"' --

# Test
node --version

But then:

+ node --version
node: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory

Which probably means nodejs requires some of these extra environment variables as well.

Only some pkgs would be ok with this approach, mainly the ones that has no runtime dependencies.

Without leaving JS' world:

hyperfine --warmup 10 --shell=none "./deno-test" "./bun-test" './llrt console.js'
Benchmark 1: ./deno-test
  Time (mean ± σ):      21.5 ms ±   1.1 ms    [User: 10.9 ms, System: 9.1 ms]
  Range (min … max):    19.2 ms …  27.8 ms    142 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./bun-test
  Time (mean ± σ):      11.7 ms ±   0.7 ms    [User: 4.3 ms, System: 7.1 ms]
  Range (min … max):    10.4 ms …  19.7 ms    277 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: ./llrt console.js
  Time (mean ± σ):       3.4 ms ±   0.2 ms    [User: 2.9 ms, System: 1.0 ms]
  Range (min … max):     3.0 ms …   4.1 ms    921 runs
 
Summary
  ./llrt console.js ran
    3.44 ± 0.27 times faster than ./bun-test
    6.34 ± 0.44 times faster than ./deno-test

This indicates that even deno takes 20ms to boot a simple js file. Considering that pkgx takes around 500ms to finally exec the program, it looks like some optimization can be done in pkgx itself as well.

I was thinking about this and I came up with a very good idea, which I believe is easy enough to be implemented and used until we can make pkgx itself faster.

Basically, make the stubs like:

#!/bin/sh
if [ "$PKGX_UNINSTALL" != 1 ]; then
+ if [ -f "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh" ]; then
+   set -a
+   . "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh"
+   exec node "$@"
+ fi
  exec pkgx +nodejs.org -- node "$@"
else
  cd "$(dirname "$0")"
  rm node && echo "uninstalled: nodejs.org" >&2
fi

Then, we also need to ensure pkgx always generates and keeps these binenv files up to date when a sync operation is executed.

For PoC purposes:

#!/bin/sh
if [ "$PKGX_UNINSTALL" != 1 ]; then
  if [ -f "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh" ]; then
    set -a
    . "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh"
    exec node "$@"
  fi
+ mkdir -p "${PKGX_DIR:-"$HOME/.pkgx"}/binenv"
+ pkgx +nodejs.org | tee "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh" >/dev/null

  exec pkgx +nodejs.org -- node "$@"
else
  cd "$(dirname "$0")"
  rm node && echo "uninstalled: nodejs.org" >&2
fi

This writes the binenv file if it hasn't been written yet.

Benchmarking:

image

The results are great. I'd say they pay off easily.

@mxcl what do you think? Would you accept a PR?

I'd like to add that currently pkgx shim is slower than running docker container on Mac.

This:
time pkgx whoami > /dev/null

is slower than this:
time docker run --rm -it alpine whoami > /dev/null