Don't git clone elasticsearch on target machines when installing an artifact that's been built on a separate machine
dliappis opened this issue · comments
Bug description
When using the build subcommand -- introduced in #1576 -- to build a certain ES revision from source on machine A, then after transferring the artifact to the right location (~/.rally/benchmarks/distributions/src
) on a different machine B, trying to install ES with the install subcommand results in an unnecessary git clone
of the ES repo on machine B. As Rally won't actually build ES from source on machine B, this is both unnecessary and time consuming.
Reproduction
(following repro is on the same machine, for simplicity)
-
Build ES
$ esrally build --revision=82aeb478dbf83d164473b161801417fc1d59060e [INFO] Creating installable binary from source files [INFO] Creating installable binary from source files { "elasticsearch": "/home/dl/.rally/benchmarks/distributions/src/elasticsearch-82aeb478dbf-linux-x86_64.tar.gz" }
-
Wipe away the src directory (if you are not going to transfer the artifact to another machine)
~/.rally/benchmarks/src $ rm -rf elasticsearch
-
Run the
install
command:$ esrally install --revision=82aeb478dbf83d164473b161801417fc1d59060e --runtime-jdk=bundled --network-host=127.0.0.1 --http-port=39200 --node-name=es01 --master-nodes=es01 --seed-hosts=127.0.0.1 --cluster-name=es01 { "installation-id": "03a7c86d-268b-4347-9a24-c69580aaa0e2" } --------------------------------- [INFO] SUCCESS (took 200 seconds) ---------------------------------
Observe that the source got re-pulled with git:
$ cd ~/.rally/benchmarks/src/ $ du -sh elasticsearch/ 1.4G elasticsearch/
(you can also observe git running during the
esrally install
withps
or execsnoop)
Technical notes/tips
There is a comparison done in
rally/esrally/mechanic/supplier.py
Lines 357 to 358 in 5e13334
self.file_name
is:
> /home/dl/source/elastic/rally/esrally/mechanic/supplier.py(359)fetch()
358 ipdb.set_trace()
--> 359 maybe_an_artifact = os.path.join(self.distributions_root, self.file_name)
360 if os.path.exists(maybe_an_artifact):
ipdb> self.distributions_root
'/home/dl/.rally/benchmarks/distributions/src'
ipdb> self.file_name
'elasticsearch-82aeb478dbf83d164473b161801417fc1d59060e-linux-x86_64.tar.gz'
whereas the actual file on disk is ~/.rally/benchmarks/distributions/src/elasticsearch-82aeb478dbf-linux-x86_64.tar.gz
(i.e. it uses a shorter SHA).
I got triggered by this, as it is adding considerable time and realized that we are shortening the revision using git rev-parse --short
which ofc resulting in unpredictable revisions.
The following change fixes this issue (but should be checked thoroughly regarding any other place its used -- I believe it should be ok):
$ git diff
diff --git a/esrally/utils/git.py b/esrally/utils/git.py
index 920fb5e..114fee0 100644
--- a/esrally/utils/git.py
+++ b/esrally/utils/git.py
@@ -120,7 +120,7 @@ def checkout_revision(src_dir, *, revision):
@probed
def head_revision(src_dir):
- return process.run_subprocess_with_output("git -C {0} rev-parse --short HEAD".format(io.escape_path(src_dir)))[0].strip()
+ return process.run_subprocess_with_output("git -C {0} rev-parse HEAD".format(io.escape_path(src_dir)))[0].strip()