add PIGZ to accelerate .tar.gz by 14x !
andy108369 opened this issue · comments
Andrey Arapov commented
We should add and probably default to pigz
compression tool.
The results speak for themselves:
- tar - 600 MiB/s
- tar.gz - 19.7 MiB/s
- tar.zst - 46.6 MiB/s
- tar.gz (PIGZ) - 275 MiB/s !
The secret is simple: gzip
is constrained to a single thread while pigz
can use multiple threads to perform the compression.
pigz
performs the same thing asgzip
, but it distributes the work across multiple processors and cores while compressing, considerably speeding up the compression/decompression process.
# TAR (no compression)
root@rpc:~# tar c -C $SNAPSHOT_DIR . 2>/dev/null | pv -petrafb -i 1 -s $SNAPSHOT_SIZE > /dev/null
8.20GiB 0:00:07 [ 988MiB/s] [1.17GiB/s] [> ] 1% ETA 0:10:26
19.0GiB 0:00:28 [ 403MiB/s] [ 693MiB/s] [==> ] 2% ETA 0:17:48
22.4GiB 0:00:37 [ 383MiB/s] [ 620MiB/s] [====> ] 3% ETA 0:19:49
24.4GiB 0:00:42 [ 405MiB/s] [ 594MiB/s] [====> ] 3% ETA 0:20:37
^C.8GiB 0:00:43 [ 385MiB/s] [ 589MiB/s] [====> ] 3% ETA 0:20:46
# TAR.GZ (`gzip -1`)
root@rpc:~# tar c -C $SNAPSHOT_DIR . 2>/dev/null | gzip -1 | pv -petrafb -i 1 -s $SNAPSHOT_SIZE >/dev/null
137MiB 0:00:07 [19.4MiB/s] [19.7MiB/s] [> ] 0% ETA 10:43:41
451MiB 0:00:23 [20.3MiB/s] [19.6MiB/s] [> ] 0% ETA 10:45:54
550MiB 0:00:28 [20.1MiB/s] [19.7MiB/s] [> ] 0% ETA 10:43:50
610MiB 0:00:31 [20.5MiB/s] [19.7MiB/s] [> ] 0% ETA 10:42:45
^C31MiB 0:00:32 [20.2MiB/s] [19.7MiB/s] [> ] 0% ETA 10:42:11
# TAR.ZST (`zstd`, v1.4.8)
root@rpc:~# tar c -C $SNAPSHOT_DIR . 2>/dev/null | zstd -c $zstd_extra_arg | pv -petrafb -i 1 -s $SNAPSHOT_SIZE > /dev/null
452MiB 0:00:10 [47.1MiB/s] [45.3MiB/s] [> ] 0% ETA 4:39:39
691MiB 0:00:15 [48.2MiB/s] [46.1MiB/s] [> ] 0% ETA 4:34:46
1.18GiB 0:00:26 [47.0MiB/s] [46.4MiB/s] [> ] 0% ETA 4:32:28
1.46GiB 0:00:32 [45.5MiB/s] [46.6MiB/s] [> ] 0% ETA 4:31:35
^C50GiB 0:00:33 [47.6MiB/s] [46.6MiB/s] [> ] 0% ETA 4:31:22
# TAR.GZ with PIGZ
root@rpc:~# tar c -C $SNAPSHOT_DIR . 2>/dev/null | pigz --fast | pv -petrafb -i 1 -s $SNAPSHOT_SIZE > /dev/null
861MiB 0:00:03 [ 294MiB/s] [ 287MiB/s] [> ] 0% ETA 0:44:05
2.15GiB 0:00:08 [ 276MiB/s] [ 275MiB/s] [> ] 0% ETA 0:45:55
4.35GiB 0:00:16 [ 278MiB/s] [ 278MiB/s] [> ] 0% ETA 0:45:18
7.07GiB 0:00:26 [ 287MiB/s] [ 278MiB/s] [> ] 0% ETA 0:45:06
8.60GiB 0:00:32 [ 267MiB/s] [ 275MiB/s] [> ] 1% ETA 0:45:30
^C
PikachuEXE commented
Use latest zstd and set thread to 0 to use all cores
Andrey Arapov commented
Use latest zstd and set thread to 0 to use all cores
Very good point!
With -T0
(or ZSTD_NBTHREADS=0
environment variable exported) it runs as good as pigz
:
root@rpc:~# tar c -C $SNAPSHOT_DIR . 2>/dev/null | zstd -c -T0 | pv -petrafb -i 1 -s $SNAPSHOT_SIZE > /dev/null
158MiB 0:00:01 [ 156MiB/s] [ 156MiB/s] [> ] 0% ETA 3:33:51
981MiB 0:00:03 [ 388MiB/s] [ 327MiB/s] [> ] 0% ETA 1:43:31
1.51GiB 0:00:05 [ 255MiB/s] [ 308MiB/s] [> ] 0% ETA 1:49:49
1.98GiB 0:00:07 [ 227MiB/s] [ 289MiB/s] [> ] 0% ETA 1:56:54
2.45GiB 0:00:09 [ 240MiB/s] [ 278MiB/s] [> ] 0% ETA 2:01:32
3.17GiB 0:00:12 [ 253MiB/s] [ 270MiB/s] [> ] 0% ETA 2:05:07
3.93GiB 0:00:15 [ 267MiB/s] [ 268MiB/s] [> ] 0% ETA 2:06:09
4.92GiB 0:00:19 [ 248MiB/s] [ 264MiB/s] [> ] 0% ETA 2:07:35
^C68GiB 0:00:22 [ 258MiB/s] [ 264MiB/s] [> ] 0% ETA 2:07:45
Closing as one can export ZSTD_NBTHREADS=0
.