TODO
shenwei356 opened this issue · comments
- add example of -v
- implement retry interval
- add more examples on bioinformatics
- do not send empty data
- support continue
- test more in windows
- avoid mixed line from multiple process, e.g. the first half of a line is from one process and the last half of the line is from another process.
- replacement string
{^suffix}
for removing suffix - add flag
--eta
please add automatic detection for using shell or not-use. (like this https://github.com/mmstick/parallel/blob/0dd48100e9a29d9a023826c778a5c7e70f9bf464/src/execute/exec_inputs.rs#L40-L45)
please add automatic detection for using shell or not-use.
OK. I'll use mattn/go-shellwords
go-shellwords doesn't detect multiple commands like foo; bar
, Sorry. BTW I'm guessing why go is faster than rust in this result is whether shell is spawned.
https://www.reddit.com/r/rust/comments/5penft/parallelizing_enjarify_in_go_and_rust/dcr4y7f/
I think running all commands using shell ($SHELL -c
for *nix and %COMSPEC% /c
for Windows) for both single command and multiple commands like foo; bar
is fine.
What I mean is Why rust is faster always
. :)
If rush can avoid to spawn shell, rush will be faster, I guess.
I get it. Thanks you.
@mattn Running commands within a shell has very little overhead for my Rust implementation when you follow the recommendation to install dash. Here's a comparison of times with and without the shell:
Without Shell
seq 1 10000 | time -v target/x86_64-unknown-linux-musl/release/parallel 'echo {}' > /dev/null
User time (seconds): 0.40
System time (seconds): 2.68
Percent of CPU this job got: 93%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.295489.640372 task-clock:u (msec)
With Shell
These are times when the shell is enabled (with dash-static-musl
installed)
seq 1 10000 | time -v target/x86_64-unknown-linux-musl/release/parallel 'echo {}; echo {}' > /dev/null
User time (seconds): 0.35
System time (seconds): 2.56
Percent of CPU this job got: 128%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.274593.366103 task-clock:u (msec)
Believe it or not, but the shell path with dash is actually much faster than the no-shell path. That is something that I will be investigating, to see where my bottleneck is in regards to the no-shell codepath.
@mmstick The rust implementation is indeed faster for this test. And the go API for running a process needs to call $SHELL -c
, so I did not compare case without using shell.
What made me confused was why rush_linux_amd64
had a bad performance in your two computers. In my laptop, for the test seq 1 10000 | time -v $CMD 'echo {}' > /dev/null
, rust-parallel
has ~4X speed of rush
but was >100X faster in your computers.
Here's a fresh result:
$ for cmd in parallel rust-parallel rush; do echo $cmd; seq 1 10000 | time -v $cmd 'echo {}' > /dev/null; done
parallel
Command being timed: "parallel echo {}"
User time (seconds): 28.73
System time (seconds): 30.66
Percent of CPU this job got: 185%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.04
rust-parallel
Command being timed: "rust-parallel echo {}"
User time (seconds): 3.13
System time (seconds): 4.82
Percent of CPU this job got: 312%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.54
rush
Command being timed: "rush echo {}"
User time (seconds): 12.81
System time (seconds): 24.45
Percent of CPU this job got: 274%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.57
Besides, speed is not the #.1 target for rush
now, especially for processes that last long. I'm using it every day in my Bioinformatics analysis and try to keep on improving the usability and stability.
Do you have any AMD hardware? Both of my systems are powered with AMD so that could be one reason. It could also be the Intel CPU governor having issues of not retaining it's max frequency long enough.
Basically, before I perform my benchmarks, I ensure that all software is closed, that the CPU governor is set to performance
via sudo cpupower frequency-set -g performance
, and that transparent_hugepages is set to madvise via sudo sh -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"
. The Linux distribution that I am operating from is Arch Linux, and I have dash-static-musl
installed because of it's high performance.
Would it be possible to process a set of commands that is specified in a file, for example like the "::::" argument in GNU parallel?