trinodb / benchto

Framework for running macro benchmarks in a clustered environment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benchto stops working when one query fails

wanglinsong opened this issue · comments

I am running TPC-H benchmarking via Benchto, but when the query fails, the Benchto driver stops with an ExecutionException. I want Benchto to continue running other queries even if a query fails.

Here are some relevant logs.

13:27:44  20:27:37.990 ERROR [pool-8-thread-1] i.t.b.d.l.LoggingBenchmarkExecutionListener - Query failed: q09 (6/6), execution error: Query failed (#20220921_202619_00023_tyrw7): Query exceeded per-node memory limit of 26.89GB [Allocated: 26.89GB, Delta: 914.75kB, Top Consumers: {HashBuilderOperator=23.10GB, ScanFilterAndProjectOperator=3.56GB, PartitionedOutputOperator=214.35MB}]
13:27:44  20:27:37.990 INFO  [main] i.t.b.d.l.LoggingBenchmarkExecutionListener - Finished benchmark: presto/tpch_medium_parquet.cluster_medium.run_187
13:27:44  Sep 21, 2022 8:27:37 PM org.springframework.context.annotation.AnnotationConfigApplicationContext doClose
13:27:44  INFO: Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@79845c99: startup date [Wed Sep 21 20:03:07 UTC 2022]; root of context hierarchy
13:27:44  Sep 21, 2022 8:27:37 PM org.springframework.jmx.export.annotation.AnnotationMBeanExporter destroy
13:27:44  INFO: Unregistering JMX-exposed beans on shutdown
13:27:44  Sep 21, 2022 8:27:37 PM org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor shutdown
13:27:44  INFO: Shutting down ExecutorService 'defaultTaskExecutor'
13:27:44  20:27:43.080 ERROR [main] io.trino.benchto.driver.DriverApp - Benchmark execution failed: Listener failed with: java.util.concurrent.ExecutionException: org.springframework.web.client.HttpServerErrorException: 500 Internal Server Error
13:27:44  java.lang.RuntimeException: Listener failed with: java.util.concurrent.ExecutionException: org.springframework.web.client.HttpServerErrorException: 500 Internal Server Error
13:27:44  	at io.trino.benchto.driver.listeners.benchmark.BenchmarkStatusReporter.processCompletedFutures(BenchmarkStatusReporter.java:76) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at io.trino.benchto.driver.execution.ExecutionDriver.executeBenchmarks(ExecutionDriver.java:128) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at io.trino.benchto.driver.execution.ExecutionDriver.execute(ExecutionDriver.java:71) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at io.trino.benchto.driver.DriverApp.main(DriverApp.java:77) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_332]
13:27:44  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_332]
13:27:44  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_332]
13:27:44  	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_332]
13:27:44  	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:53) [trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at java.lang.Thread.run(Thread.java:750) [na:1.8.0_332]
13:27:44  Caused by: java.util.concurrent.ExecutionException: org.springframework.web.client.HttpServerErrorException: 500 Internal Server Error
13:27:44  	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[na:1.8.0_332]
13:27:44  	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) ~[na:1.8.0_332]
13:27:44  	at io.trino.benchto.driver.listeners.benchmark.BenchmarkStatusReporter.processCompletedFutures(BenchmarkStatusReporter.java:69) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	... 9 common frames omitted
13:27:44  Caused by: org.springframework.web.client.HttpServerErrorException: 500 Internal Server Error
13:27:44  	at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:94) ~[spring-web-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:614) ~[spring-web-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:570) ~[spring-web-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:538) ~[spring-web-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at org.springframework.web.client.RestTemplate.postForObject(RestTemplate.java:340) ~[spring-web-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at io.trino.benchto.driver.service.BenchmarkServiceClient.postForObject(BenchmarkServiceClient.java:133) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at io.trino.benchto.driver.service.BenchmarkServiceClient.postForObject(BenchmarkServiceClient.java:122) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at io.trino.benchto.driver.service.BenchmarkServiceClient.finishExecution(BenchmarkServiceClient.java:108) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at io.trino.benchto.driver.service.BenchmarkServiceClient$$FastClassBySpringCGLIB$$2bf18955.invoke(<generated>) ~[spring-core-4.1.6.RELEASE.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:717) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at org.springframework.retry.interceptor.RetryOperationsInterceptor$1.doWithRetry(RetryOperationsInterceptor.java:74) ~[spring-retry-1.1.2.RELEASE.jar!/:na]
13:27:44  	at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:263) ~[spring-retry-1.1.2.RELEASE.jar!/:na]
13:27:44  	at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:154) ~[spring-retry-1.1.2.RELEASE.jar!/:na]
13:27:44  	at org.springframework.retry.interceptor.RetryOperationsInterceptor.invoke(RetryOperationsInterceptor.java:101) ~[spring-retry-1.1.2.RELEASE.jar!/:na]
13:27:44  	at org.springframework.retry.annotation.AnnotationAwareRetryOperationsInterceptor.invoke(AnnotationAwareRetryOperationsInterceptor.java:118) ~[spring-retry-1.1.2.RELEASE.jar!/:na]
13:27:44  	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
13:27:44  	at io.trino.benchto.driver.service.BenchmarkServiceClient$$EnhancerBySpringCGLIB$$ddd26b16.finishExecution(<generated>) ~[spring-core-4.1.6.RELEASE.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at io.trino.benchto.driver.listeners.BenchmarkServiceExecutionListener.lambda$executionFinished$9(BenchmarkServiceExecutionListener.java:162) ~[trino-benchto-driver-0.20.jar!/:0.13-60-gd9a0e3f]
13:27:44  	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[na:1.8.0_332]
13:27:44  	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[na:1.8.0_332]
13:27:44  	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[na:1.8.0_332]
13:27:44  	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609) ~[na:1.8.0_332]
13:27:44  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_332]
13:27:44  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_332]
13:27:44  	... 1 common frames omitted

Can you also attach logs of the benchto-service? This sounds like a failure in benchto, for example when saving results, not with running the query in Trino. It's unexpected, that's why the driver is interrupted.

@nineinchnick thank you for the tip.
I checked benchto service log, did get this error.
18:02:14.808 ERROR o.h.e.jdbc.spi.SqlExceptionHelper - Data truncation: Data too long for column 'value' at row 1
It looks like the stack trace is too long, so I need to update table benchmark_runs_attributes.

image

Sorry, I missed your response. benchmark_runs_attributes.value shouldn't have a limit, which version of Benchto are you using and with what database?