elastic / rally

Macrobenchmarking framework for Elasticsearch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Start using `.options()` for transport parameters

b-deam opened this issue · comments

As part of the ES client upgrade to 8.x the client now requires what were previously request level parameters to be set at the client's respective transport level.

For example, perhaps the most pervasive change here is the previously global, per-request setting request-timeout:

# 7.x DEPRECATED USAGE (Don't do this!):
client.search(index="blogs", request_timeout=180)

# 8.0+ SUPPORTED USAGE:
client.options(request_timeout=180).search(index="blogs")

We need to refactor all of our runners (and custom runners in rally-tracks) to set these parameters at the transport level.

Note:

When we move to setting options within runners we need to consider that the underlying elasticsearch-py library reinstantiates the client:
https://github.com/elastic/elasticsearch-py/blob/2b297b2922ab11a1439296737812d9799d8a0025/elasticsearch/_async/client/__init__.py#L500

I think the solution here is probably to override the options method on the respective RallyAsyncElasticsearch and RallySyncElasticsearch clients and do something like this:

- client = type(self)(_transport=self.transport)
+ client = type(self)(_transport=self.transport, distribution_version=str(self.distribution_version))

Mechanically this was going to be a reasonable amount of work on its own, but it turns out that this new approach incurs a relatively high client side overhead.

After merging #1705 I noticed that some nightly benchmarks were displaying higher than normal latency (latency = processing_time + service_time). Some reproductions and profiling later and it's clear that the .options() approach is slower.

The absolute numbers aren't large, we're talking only a few hundred microseconds to a millisecond on each request, but some of our benchmarks are sensitive enough to notice this.
image

So using some static responses and our in-built profiling (--enable-driver-profiling) it became clear that #1705 .options() causes us to reinstantiate a new client on every request:

name                                                                                                                                          ncall     tsub      ttot      tavg      
/home/esbench/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py:1845 _UnixSelectorEventLoop._run_once                              3976      0.072129  5.481492  0.001379
/home/esbench/.pyenv/versions/3.11.2/lib/python3.11/asyncio/events.py:78 TimerHandle._run                                                     3977      0.011262  5.254178  0.001321
/home/esbench/rally/esrally/driver/driver.py:1843 AsyncExecutor.__call__                                                                      1         0.078444  5.195176  5.195176
/home/esbench/rally/esrally/driver/driver.py:1796 AsyncProfiler.__call__                                                                      1         0.001048  5.187659  5.187659
/home/esbench/rally/esrally/driver/driver.py:1957 execute_single                                                                              2000      0.026569  4.796528  0.002398
/home/esbench/rally/esrally/driver/runner.py:290 NoCompletion.__call__                                                                        2000      0.006317  4.727379  0.002364
/home/esbench/rally/esrally/driver/runner.py:404 AssertingRunner.__call__                                                                     2000      0.007228  4.721062  0.002361
/home/esbench/rally/esrally/driver/runner.py:338 MultiClientRunner.__call__                                                                   2000      0.013656  4.713834  0.002357
/home/esbench/rally/esrally/driver/runner.py:862 Query.__call__                                                                               2000      0.051384  4.698539  0.002349
/home/esbench/rally/esrally/driver/runner.py:1027 _request_body_query                                                                         2000      0.009655  3.665306  0.001833
/home/esbench/rally/esrally/driver/runner.py:1165 Query._raw_search                                                                           2000      0.015916  3.654368  0.001827
/home/esbench/rally/esrally/client/asynchronous.py:277 RallyAsyncElasticsearch.perform_request                                                2000      0.055657  3.633689  0.001817
/home/esbench/.local/lib/python3.11/site-packages/elastic_transport/_async_transport.py:176 RallyAsyncTransport.perform_request               2000      0.097287  3.163554  0.001582
/home/esbench/.local/lib/python3.11/site-packages/elastic_transport/_node/_http_aiohttp.py:126 RallyAiohttpHttpNode.perform_request           2000      0.075957  2.669151  0.001335
/home/esbench/.local/lib/python3.11/site-packages/aiohttp/client.py:1140 _RequestContextManager.__aenter__                                    2000      0.009856  1.970503  0.000985
/home/esbench/.local/lib/python3.11/site-packages/aiohttp/client.py:364 ClientSession._request                                                2000      0.147258  1.960647  0.000980
+ /home/esbench/.local/lib/python3.11/site-packages/elasticsearch/_async/client/__init__.py:486 RallyAsyncElasticsearch.options                 2000      0.026028  0.937524  0.000469
+ /home/esbench/rally/esrally/client/asynchronous.py:260 RallyAsyncElasticsearch.__init__                                                       2000      0.020966  0.863369  0.000432
+ /home/esbench/.local/lib/python3.11/site-packages/elasticsearch/_async/client/__init__.py:128 RallyAsyncElasticsearch.__init__                2000      0.127874  0.826059  0.000413
+ /home/esbench/.local/lib/python3.11/site-packages/elasticsearch/_async/client/_base.py:375 AsyncSearchClient.__init__                         68000     0.246066  0.694591  0.000010
/home/esbench/.local/lib/python3.11/site-packages/aiohttp/client_reqrep.py:247 StaticRequest.__init__                                         2000      0.055712  0.562107  0.000281
+ /home/esbench/.local/lib/python3.11/site-packages/elasticsearch/_async/client/_base.py:238 RallyAsyncElasticsearch.__init__                   70000     0.295971  0.414709  0.000006

Every time we instantiate a new client we also instantiate all of the namespaced clients too. In the above output we call 2000 search requests, which means we're calling <client>.__init__ 34 times per runner call:
https://github.com/elastic/elasticsearch-py/blob/9ecf7331794952e031f76f1c4b57972f31c81517/elasticsearch/_async/client/__init__.py#L428-L461

...
        # namespaced clients for compatibility with API names
        self.async_search = AsyncSearchClient(self)
        self.autoscaling = AutoscalingClient(self)
        self.cat = CatClient(self)
        self.cluster = ClusterClient(self)
        self.fleet = FleetClient(self)
        self.features = FeaturesClient(self)
        self.indices = IndicesClient(self)
        self.ingest = IngestClient(self)
        self.nodes = NodesClient(self)
        self.snapshot = SnapshotClient(self)
        self.tasks = TasksClient(self)
...

I don't think we'll want to allow changing these parameters per-each <runner>.__call__(), so perhaps we need to find a better way of setting the respective .options() once per runner, on initialisation, and not calling .options() again within <runner>.__call__().

@b-deam there is currently also another issue with the existing usage of es.options() that I stumbled upon. The current calls to es.options() (see https://github.com/elastic/rally/pull/1705/files) are ineffective because options returns a new client that needs to be used but we ignore the return value and continue to use the original client. All in all I think this API is not well-suited and this should be fixed directly in the client instead of working around a client API design issue in Rally.

To fix the issue that I mentioned you need to do that:

- es.options(**transport_params)
+ es = es.options(**transport_params)

(for all call sites)