GPU OutOfMemory while DISTINCT a partitionedBy column on DeltaTable ?

Question

GPU OutOfMemory while DISTINCT a partitionedBy column on DeltaTable ?

LIN-Yu-Ting opened this issue a month ago · comments

We are using Spark Rapids + Spark Thrift Server to serve SQL request on Spark 3.3.0 and Rapids 23.10 and we have a Delta Table partitioned by a column, runName.

We executed a SQL query SELECT DISTINCT runName FROM table on the mentioned table with which we got the following error messages saying that GPU is out of memory.

24/06/11 04:14:23.084 [task-result-getter-2] WARN  o.a.spark.scheduler.TaskSetManager - Lost task 38.0 in stage 23.0 (TID 571) (10.0.0.12 executor 2): java.lang.OutOfMemoryError: Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni-release-11-cuda12/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/limiting_resource_adaptor.hpp:144: Exceeded memory limit
	at ai.rapids.cudf.ColumnVector.fromScalar(Native Method)
	at ai.rapids.cudf.ColumnVector.fromScalar(ColumnVector.java:430)
	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues$.$anonfun$buildPartitionColumns$1(ColumnarPartitionReaderWithPartitionValues.scala:107)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues$.buildPartitionColumns(ColumnarPartitionReaderWithPartitionValues.scala:105)
	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues$.$anonfun$addPartitionValues$1(ColumnarPartitionReaderWithPartitionValues.scala:79)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues$.addPartitionValues(ColumnarPartitionReaderWithPartitionValues.scala:78)
	at com.nvidia.spark.rapids.MultiFileReaderUtils$.$anonfun$addSinglePartitionValuesAndClose$2(GpuMultiFileReader.scala:245)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:56)
	at com.nvidia.spark.rapids.MultiFileReaderUtils$.addSinglePartitionValuesAndClose(GpuMultiFileReader.scala:243)
	at com.nvidia.spark.rapids.MultiFileReaderFunctions.addPartitionValues(GpuMultiFileReader.scala:119)
	at com.nvidia.spark.rapids.MultiFileReaderFunctions.addPartitionValues$(GpuMultiFileReader.scala:114)
	at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader.addPartitionValues(GpuParquetScan.scala:2086)
	at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader.readBatches(GpuParquetScan.scala:2539)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.liftedTree1$1(GpuMultiFileReader.scala:572)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.readBuffersToBatch(GpuMultiFileReader.scala:571)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1(GpuMultiFileReader.scala:764)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1$adapted(GpuMultiFileReader.scala:719)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.next(GpuMultiFileReader.scala:719)
	at com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)
	at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1(GpuDataSourceRDD.scala:66)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(GpuDataSourceRDD.scala:66)
	at scala.Option.exists(Option.scala:376)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:66)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.advanceToNextIter(GpuDataSourceRDD.scala:90)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:66)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:474)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$hasNext$4(aggregate.scala:1922)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.hasNext(aggregate.scala:1922)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:332)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

24/06/11 04:14:48.097 [task-result-getter-0] WARN  o.a.spark.scheduler.TaskSetManager - Lost task 3.2 in stage 23.0 (TID 580) (10.0.0.12 executor 0): com.nvidia.spark.rapids.jni.RetryOOM: GPU OutOfMemory
	at ai.rapids.cudf.Table.contiguousSplit(Native Method)
	at ai.rapids.cudf.Table.contiguousSplit(Table.java:2298)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.$anonfun$splitSpillableInHalfByRows$4(RmmRapidsRetryIterator.scala:634)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.$anonfun$splitSpillableInHalfByRows$3(RmmRapidsRetryIterator.scala:632)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.$anonfun$splitSpillableInHalfByRows$2(RmmRapidsRetryIterator.scala:631)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.$anonfun$splitSpillableInHalfByRows$1(RmmRapidsRetryIterator.scala:625)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.split(RmmRapidsRetryIterator.scala:442)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:557)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:495)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:496)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.aggregateInputBatches(aggregate.scala:795)
	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.$anonfun$next$2(aggregate.scala:752)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(aggregate.scala:749)
	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(aggregate.scala:711)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$next$6(aggregate.scala:2034)
	at scala.Option.map(Option.scala:230)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(aggregate.scala:2034)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(aggregate.scala:1898)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:333)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

24/06/11 04:14:36.169 [dispatcher-CoarseGrainedScheduler] INFO  o.a.spark.scheduler.TaskSetManager - Starting task 99.1 in stage 22.0 (TID 574) (10.0.0.12, executor 0, partition 99, PROCESS_LOCAL, 8406 bytes) taskResourceAssignments Map(gpu -> [name: gpu, addresses: 0])
24/06/11 04:14:36.172 [task-result-getter-3] WARN  o.a.spark.scheduler.TaskSetManager - Lost task 98.0 in stage 22.0 (TID 526) (10.0.0.12 executor 0): com.nvidia.spark.rapids.jni.SplitAndRetryOOM: GPU OutOfMemory: could not split inputs and retry
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.split(RmmRapidsRetryIterator.scala:439)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:557)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:495)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.drainSingleWithVerification(RmmRapidsRetryIterator.scala:287)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRetryNoSplit(RmmRapidsRetryIterator.scala:132)
	at com.nvidia.spark.rapids.CachedGpuBatchIterator.next(GpuDataProducer.scala:125)
	at com.nvidia.spark.rapids.CachedGpuBatchIterator.next(GpuDataProducer.scala:116)
	at com.nvidia.spark.rapids.GpuColumnarBatchWithPartitionValuesIterator.next(GpuColumnarBatchIterator.scala:116)
	at com.nvidia.spark.rapids.GpuColumnarBatchWithPartitionValuesIterator.next(GpuColumnarBatchIterator.scala:101)
	at scala.collection.TraversableOnce$FlattenOps$$anon$2.next(TraversableOnce.scala:522)
	at com.nvidia.spark.rapids.FilePartitionReaderBase.get(GpuMultiFileReader.scala:392)
	at com.nvidia.spark.rapids.FilePartitionReaderBase.get(GpuMultiFileReader.scala:383)
	at com.nvidia.spark.rapids.PartitionIterator.next(dataSourceUtil.scala:39)
	at com.nvidia.spark.rapids.MetricsBatchIterator.next(dataSourceUtil.scala:49)
	at com.nvidia.spark.rapids.MetricsBatchIterator.next(dataSourceUtil.scala:43)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.next(GpuDataSourceRDD.scala:70)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
	at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.next(GpuFileSourceScanExec.scala:480)
	at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.next(GpuFileSourceScanExec.scala:469)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at com.nvidia.spark.rapids.AbstractProjectSplitIterator.next(basicPhysicalOperators.scala:248)
	at com.nvidia.spark.rapids.AbstractProjectSplitIterator.next(basicPhysicalOperators.scala:228)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.$anonfun$next$2(aggregate.scala:751)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(aggregate.scala:749)
	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(aggregate.scala:711)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$next$6(aggregate.scala:2034)
	at scala.Option.map(Option.scala:230)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(aggregate.scala:2034)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(aggregate.scala:1898)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:333)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

However, this error does not pop out if we executed SELECT DISTINCT other columns FROM table.

In addition, this error happens if we are using Azure NCas64_T4_v3 server with 64 CPU cores and 4 x T4 GPUs. However, error does not happens if we use 2 x NCas8_T4_v3 cluster with each 8 CPU cores and 1 x T4 GPUs.

It seems quite intuitively that it is caused by large size of partition files (around 100MB each). However, this error does not show if we distinct other columns, which makes our assumption not that reasonable. Moreover, as we are distincting a partitionedBy column, normally, Spark does not need to load parquets to figure out which are the values of runName. Instead, it only need to read directory name to get the value. This should not make GPU out of memory.

Any ideas ?

Robert (Bobby) Evans · Answer 1 · Fri Jun 14 2024 22:14:04 GMT+0800 (China Standard Time)

@LIN-Yu-Ting
Generally we treat GPU OutOfMemory errors as bugs that need to be fixed. There are a few cases where an algorithm cannot be split up into smaller pieces and we cannot fix the issue, but most of the time running out of memory is something that we can work around with spilling to either host memory or disk.

If you want to file three separate bugs, one for each of the stack traces, I am happy to try and fix them. If you don't want to just let me know and I'll file something myself.

In the short term you have a few choices to try and mitigate the problem. The goal would be to try and reduce the memory pressure on the GPU. There are several different config settings you can try to help reduce the memory pressure on the GPU.

You could try and set spark.sql.files.maxPartitionBytes smaller. This should reduce the amount of compressed data each task gets, and would reduce the total number of rows. This does not always work because it cannot split up a single parquet row group.

You could also try to reduce the number of concurrent tasks on the GPU by setting spark.rapids.sql.concurrentGpuTasks to 1. By default we run with 2 tasks on the GPU to try and keep the GPU as busy as possible, but it does increase the amount of data on the GPU.

You could also try and set the spark.rapids.sql.batchSizeBytes config to be smaller. By default we target each task to process about 1 GiB of data, and use up to 3 more GiB of scratch space. But this is a soft limit and we might use more if the algorithm needs it. If we set the target smaller, then it is likely that the overall memory pressure will be less.

Please note that all of these are also likely to reduce the performance of your query. Also I am just guessing based on the stack traces. The first and third stack traces look to be running out of memory when trying to read data in from a file.

The second stack trace looks like you are doing a round robin partitioning as a part of your job and the sort that is implicit in that operation ran out of memory. That one is harder because I don't know what task it is associated with, or what. Sort tends to be fairly well behaved with spilling, but if you can help us with a repro case that would let us try and figure out exactly what is happening.

LIN-Yu-Ting · Answer 2 · Sat Jun 15 2024 14:12:10 GMT+0800 (China Standard Time)

@revans2. Thanks in advance for your responsive comments. I do not mind that you create these three exception stack trace into other issues. If you can link them to this original one then it will be easier to trace.

About your three quick config changes, I have tried to adjust spark.rapids.sql.concurrentGpuTasks to 1 and it does not seem to help. The other two config improvement I will try later.

And about repro case, I will have a look on how to share data or other possibilities.

LIN-Yu-Ting · Answer 3 · Tue Jun 18 2024 23:34:31 GMT+0800 (China Standard Time)

@revans2
Let me precise our DeltaTable:

+-------+-------------------+------+--------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-----------------------------------------------------------------------------+------------+-----------------------------------+
|version|timestamp          |userId|userName|operation              |operationParameters                                                                                                                                            |job |notebook|clusterId|readVersion|isolationLevel|isBlindAppend|operationMetrics                                                             |userMetadata|engineInfo                         |
+-------+-------------------+------+--------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-----------------------------------------------------------------------------+------------+-----------------------------------+
|25     |2024-06-11 12:22:16|null  |null    |REPLACE TABLE AS SELECT|{isManaged -> false, description -> null, partitionBy -> [], properties -> {}}                                                                                 |null|null    |null     |24         |Serializable  |false        |{numFiles -> 127, numOutputRows -> 9572955254, numOutputBytes -> 13631153367}|null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|
|24     |2024-06-11 09:23:06|null  |null    |REPLACE TABLE AS SELECT|{isManaged -> false, description -> null, partitionBy -> ["runName"], properties -> {"delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"10"}}|null|null    |null     |23         |Serializable  |false        |{numFiles -> 215, numOutputRows -> 9572955254, numOutputBytes -> 12304450868}|null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|
|23     |2024-06-02 12:47:30|null  |null    |REPLACE TABLE AS SELECT|{isManaged -> false, description -> null, partitionBy -> ["runName"], properties -> {}}                                                                        |null|null    |null     |22         |Serializable  |false        |{numFiles -> 280, numOutputRows -> 9572955254, numOutputBytes -> 9649503160} |null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|
|22     |2024-05-14 11:20:34|null  |null    |WRITE                  |{mode -> Append, partitionBy -> []}                                                                                                                            |null|null    |null     |21         |Serializable  |true         |{numFiles -> 200, numOutputRows -> 549120214, numOutputBytes -> 290521902}   |null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|
|21     |2024-05-14 09:53:39|null  |null    |WRITE                  |{mode -> Append, partitionBy -> []}                                                                                                                            |null|null    |null     |20         |Serializable  |true         |{numFiles -> 200, numOutputRows -> 551313299, numOutputBytes -> 459494733}   |null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|

Number of Row 9,572,955,254
version 22 -> no partitionBy around 4000 files.
version 23 -> from 22. 280 files. We executed REPLACE TABLE USING DELTA PARTITIONED BY (runName) AS SELECT * FROM Table.
version 24 -> from 23. 215 files. We executed REPLACE TABLE USING DELTA PARTITIONED BY (runName) TBLPROPERTIES ('delta.columnMapping.mode' = 'name', 'delta.minReaderVersion' = 2, 'delta.minWriterVersion' = 5) AS SELECT * FROM Table.
version 25 -> from 24. 127 files. We executed REPLACE TABLE USING DELTA AS SELECT * FROM Table from version 24.

version 23 -> 8cores/1GPU, 16cores/1GPU all get exception when executing SELECT DISTINCT runName FROM Table
version 24 -> 8cores/1GPU safe, 16cores/1GPU get exception when executing SELECT DISTINCT runName FROM Table.
version 25 -> No exception when executing SELECT DISTINCT runName FROM Table

Btw, it seems that modifying spark.rapids.sql.batchSizeBytes, spark.rapids.sql.concurrentGpuTasks and spark.sql.files.maxPartitionBytes do not help. Exception comes out only when DISTINCT partitionedBy column which makes me believe that are something wrong when reading partitionedBy with GPU.

Robert (Bobby) Evans · Answer 4 · Tue Jun 25 2024 00:24:04 GMT+0800 (China Standard Time)

Thanks for the updated information. We will try and reproduce this locally and see what we can come up with. For now I think I will just move this over to a bug and then we can figure out if we need to split it up further.

Matt Ahrens · Answer 5 · Wed Jun 26 2024 04:16:03 GMT+0800 (China Standard Time)

P0 scope is to identify opportunity for improved OOM + retry handling. This could potentially be fixed by chunking the data differently given the partition data being read.

LIN-Yu-Ting · Answer 6 · Sat Jun 29 2024 16:37:35 GMT+0800 (China Standard Time)

@mattahrens I have downloaded spark rapids project by myself and built it successfully. Do you have any suggestions if I would like to run it on my own IDE with debugger to understand the mechanism behind it.

Jason Lowe · Answer 7 · Wed Jul 03 2024 23:20:21 GMT+0800 (China Standard Time)

Do you have any suggestions if I would like to run it on my own IDE with debugger to understand the mechanism behind it.

Thanks for the interest, @LIN-Yu-Ting! Probably the best way to get started is to review the contributors guide which covers how to setup the IDE. This projecet is a bit tricky for IDEs since we support multiple Spark versions, and the way we implement the shims can confuse many IDEs.

As for places to look in the code once you have it reproducing in a debugger (I'm assuming in local mode to make it easier), that's a bit tricky since the code is a bit complicated. We support multiple modes for reading. The stacktrace shows we're using MultiFileCloudPartitionReaderBase which indicates we're using MultiFileCloudParquetPartitionReader as the columnar reader for the files. You could start tracing through the code from there.

One of the stacktraces shows we're running out of memory while dealing with manifesting the partition values as part of the columnar batch. I see you're using version 23.10, and there have been some memory handling improvements in this area, notably via #9230 which adds the split-and-retry framework when dealing with partition values. Have you tried running with a more recent version to see if the problem persists?

Jason Lowe · Answer 8 · Thu Jul 04 2024 00:38:08 GMT+0800 (China Standard Time)

Thinking about this a bit more, I suspect this is a miscalculation on batch size memory budgeting with partition values. Currently we're trying to hit a target batch size when loading from a Parquet file, but partition columns are not in that file. Therefore when we tack on partition columns to the batch that was loaded, they are "extra" in terms of the memory budgeting. That's a problem that should be fixed. It's particularly bad when all that's being loaded is the partition column, since we're just loading row counts from the Parquet file. The memory cost of those rows is zero, as there's no column data being loaded just row counts. That means we'll load maximum batch sizes, and then when we try to apply the partition values we'll go way past the memory budgets.

There is a config that controls the maximum number of rows we'll load, spark.rapids.sql.reader.batchSizeRows, which has a default of max int. Can you try running with lower values for this config to see if it's able to workaround the issue? For example, could try setting this to 1000000 or factors of ten below that to see if it is able to workaround the OOM issue.

LIN-Yu-Ting · Answer 9 · Sun Jul 07 2024 10:55:04 GMT+0800 (China Standard Time)

@jlowe. Many Thanks for your response and suggestions.

I have tried to use newest rapids jar rapids-4-spark_2.12_24.06.0.jar downloaded from Maven repository and I tried to execute the same query SELECT DISTINCT runName FROM Table on

Version 22 -> No problem
Version 23 -> With following exception
Version 24 -> No more problem
Version 25 -> No problem

02:31:01.963 WARN  TaskSetManager - Lost task 67.0 in stage 21.0 (TID 381) (10.0.0.10 executor 0): java.lang.ArrayIndexOutOfBoundsException: 0
	at ai.rapids.cudf.Table.<init>(Table.java:58)
	at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:524)
	at com.nvidia.spark.rapids.BatchWithPartitionDataUtils$.splitColumnarBatch(BatchWithPartitionData.scala:418)
	at com.nvidia.spark.rapids.BatchWithPartitionDataUtils$.splitAndCombineBatchWithPartitionData(BatchWithPartitionData.scala:360)
	at com.nvidia.spark.rapids.BatchWithPartitionDataUtils$.$anonfun$addPartitionValuesToBatch$1(BatchWithPartitionData.scala:188)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
	at com.nvidia.spark.rapids.BatchWithPartitionDataUtils$.addPartitionValuesToBatch(BatchWithPartitionData.scala:181)
	at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader.readBatches(GpuParquetScan.scala:2518)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.liftedTree1$1(GpuMultiFileReader.scala:483)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.readBuffersToBatch(GpuMultiFileReader.scala:482)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1(GpuMultiFileReader.scala:675)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1$adapted(GpuMultiFileReader.scala:630)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
	at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.next(GpuMultiFileReader.scala:630)
	at com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)
	at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1(GpuDataSourceRDD.scala:73)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(GpuDataSourceRDD.scala:73)
	at scala.Option.exists(Option.scala:376)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:73)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.advanceToNextIter(GpuDataSourceRDD.scala:97)
	at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:73)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:474)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$hasNext$4(GpuAggregateExec.scala:1930)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.hasNext(GpuAggregateExec.scala:1930)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:332)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

I also test newest jar on two following tables:

Table with columnMapping

|0      |2024-06-19 02:01:07|null  |null    |CREATE TABLE AS SELECT|{isManaged -> true, description -> null, partitionBy -> ["runName"], properties -> {"delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"10"}}|null|null    |null     |null       |Serializable  |true         |{numFiles -> 280, numOutputRows -> 9572955254, numOutputBytes -> 9910950915}|null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|

Table without columnMapping

|0      |2024-06-19 02:17:33|null  |null    |CREATE TABLE AS SELECT|{isManaged -> true, description -> null, partitionBy -> ["runName"], properties -> {}}|null|null    |null     |null       |Serializable  |true         |{numFiles -> 280, numOutputRows -> 9572955254, numOutputBytes -> 9910154837}|null        |Apache-Spark/3.3.0 Delta-Lake/2.3.0|

With same amount of data, table with columnMapping property will encounter same error as above, which is a little bit different from the previous case, as Version 23 does not apply column mapping and Version 24 applies column mapping.

Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
  at ai.rapids.cudf.Table.<init>(Table.java:58)
  at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:524)
  at com.nvidia.spark.rapids.BatchWithPartitionDataUtils$.splitColumnarBatch(BatchWithPartitionData.scala:418)
  at com.nvidia.spark.rapids.BatchWithPartitionDataUtils$.splitAndCombineBatchWithPartitionData(BatchWithPartitionData.scala:360)
  at com.nvidia.spark.rapids.BatchWithPartitionDataUtils$.$anonfun$addPartitionValuesToBatch$1(BatchWithPartitionData.scala:188)
  at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
  at com.nvidia.spark.rapids.BatchWithPartitionDataUtils$.addPartitionValuesToBatch(BatchWithPartitionData.scala:181)
  at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader.readBatches(GpuParquetScan.scala:2518)
  at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.liftedTree1$1(GpuMultiFileReader.scala:483)
  at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.readBuffersToBatch(GpuMultiFileReader.scala:482)
  at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1(GpuMultiFileReader.scala:675)
  at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1$adapted(GpuMultiFileReader.scala:630)
  at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
  at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.next(GpuMultiFileReader.scala:630)
  at com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)
  at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)
  at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1(GpuDataSourceRDD.scala:73)
  at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(GpuDataSourceRDD.scala:73)
  at scala.Option.exists(Option.scala:376)
  at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:73)
  at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.advanceToNextIter(GpuDataSourceRDD.scala:97)
  at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:73)
  at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
  at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:474)
  at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$hasNext$4(GpuAggregateExec.scala:1930)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
  at scala.Option.getOrElse(Option.scala:189)
  at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.hasNext(GpuAggregateExec.scala:1930)
  at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:332)
  at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
  at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
  at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
  at org.apache.spark.scheduler.Task.run(Task.scala:136)
  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:750)

Jason Lowe · Answer 10 · Tue Jul 09 2024 04:38:29 GMT+0800 (China Standard Time)

The ArrayIndexOutOfBoundsException is a bug in BatchWithPartitionDataUtils.splitColumnarBatch where it's not handling a row-count-only columnar batch properly. Filed #11155 to track that bug.

LIN-Yu-Ting · Answer 11 · Fri Jul 12 2024 17:40:20 GMT+0800 (China Standard Time)

@jlowe Thanks.

I have tested branch 24.08 with jar built by myself, which works well in jupyterlab environment.
However, I got the following exception while launching with Spark Thrift Server.

java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/RuleNotFoundExprMeta
        at com.nvidia.spark.rapids.GpuOverrides$.wrapExpr(GpuOverrides.scala:843)
        at com.nvidia.spark.rapids.SparkPlanMeta.$anonfun$childExprs$1(RapidsMeta.scala:641)
        at scala.collection.immutable.Stream.map(Stream.scala:418)
        at com.nvidia.spark.rapids.SparkPlanMeta.<init>(RapidsMeta.scala:641)
        at com.nvidia.spark.rapids.DoNotReplaceOrWarnSparkPlanMeta.<init>(RapidsMeta.scala:933)
        at com.nvidia.spark.rapids.GpuOverrides$.doWrap$1(GpuOverrides.scala:812)
        at com.nvidia.spark.rapids.GpuOverrides$.$anonfun$neverReplaceExec$1(GpuOverrides.scala:813)
        at com.nvidia.spark.rapids.ReplacementRule.wrap(GpuOverrides.scala:213)
        at com.nvidia.spark.rapids.GpuOverrides$.$anonfun$wrapPlan$1(GpuOverrides.scala:4002)
        at scala.Option.map(Option.scala:230)
        at com.nvidia.spark.rapids.GpuOverrides$.wrapPlan(GpuOverrides.scala:4002)
        at com.nvidia.spark.rapids.GpuOverrides$.wrapAndTagPlan(GpuOverrides.scala:4365)
        at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:4692)
        at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$3(GpuOverrides.scala:4577)
        at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:454)
        at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$1(GpuOverrides.scala:4574)
        at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4540)
        at com.nvidia.spark.rapids.GpuOverrides.applyWithContext(GpuOverrides.scala:4594)
        at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4567)
        at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4563)
        at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:553)
        at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1$adapted(Columnar.scala:553)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:553)
        at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:514)
        at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:440)
        at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
        at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
        at scala.collection.immutable.List.foldLeft(List.scala:91)
        at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:439)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:158)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
        at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:158)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
        at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:204)
        at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:249)
        at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:218)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:103)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
        at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
        at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
        at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
        at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:291)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:216)
        at org.apache.hive.service.cli.operation.Operation.run(Operation.java:277)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkOperation$$super$run(SparkExecuteStatementOperation.scala:43)
        at org.apache.spark.sql.hive.thriftserver.SparkOperation.$anonfun$run$1(SparkOperation.scala:45)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)
        at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
        at org.apache.spark.sql.hive.thriftserver.SparkOperation.run(SparkOperation.scala:45)
        at org.apache.spark.sql.hive.thriftserver.SparkOperation.run$(SparkOperation.scala:43)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(SparkExecuteStatementOperation.scala:43)
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:484)
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:460)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:71)
        at org.apache.hive.service.cli.session.HiveSessionProxy.lambda$invoke$0(HiveSessionProxy.java:58)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:58)
        at com.sun.proxy.$Proxy60.executeStatement(Unknown Source)
        at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:280)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:456)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
        at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:180)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:523)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
        at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
        at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
        at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at org.sparkproject.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
        at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
        at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
        at org.sparkproject.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
        at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
        at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.sparkproject.jetty.server.Server.handle(Server.java:516)
        at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
        at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
        at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
        at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.sparkproject.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
        at org.sparkproject.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
        at org.sparkproject.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
        at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
        at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
        at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
        at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        at org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/RuleNotFoundExprMeta
        ... 139 common frames omitted
Caused by: java.lang.ClassNotFoundException: com.nvidia.spark.rapids.RuleNotFoundExprMeta
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 139 common frames omitted

I also tested with version 24.06 and I got the same error. It seems that it is the same error that I have already reported before.

Jason Lowe · Answer 12 · Mon Jul 15 2024 21:55:10 GMT+0800 (China Standard Time)

It seems that it is the #9966 that I have already reported before.

Yes, this is likely the same issue. I assume the workaround you noted before still works?

LIN-Yu-Ting · Answer 13 · Tue Jul 16 2024 11:05:55 GMT+0800 (China Standard Time)

@jlowe Yes. The workaround still works. Thanks for your help.

Jason Lowe · Answer 14 · Tue Jul 16 2024 21:59:06 GMT+0800 (China Standard Time)

Glad to hear you have a path forward with the workaround. Did I understand correctly that running the 24.08 snapshot with the fix from #11155 resolved this issue? We can track the independent thrift server issues via #9966.