[Ray Data] map_batches treats num_gpus=0 as specifying a workload to run on a GPU

Question

[Ray Data] map_batches treats num_gpus=0 as specifying a workload to run on a GPU

marwan116 opened this issue 20 days ago · comments

What happened + What you expected to happen

Explicitly setting num_gpus=0 inside a map_batches call is being treated as if I am specifying a workload to be run on GPUs - which is not intuitive. The current implementation seems to only be treating num_gpus is None as not specifying a GPU workload

Versions / Dependencies

Ray 2.12.0

Reproduction script

import ray.data

ds = ray.data.from_items([1, 2, 3, 4, 5])

def identity(batch):
    return batch

ds.map_batches(identity, num_gpus=0)

this will generate this error:

ValueError: `batch_size` must be provided to `map_batches` when requesting GPUs. The optimal batch size depends on the model, data, and GPU used. It is recommended to use the largest batch size that doesn't result in your GPU device running out of memory. You can view the GPU memory usage via the Ray dashboard.

but if you change it to the following, it won't

ds.map_batches(identity, num_gpus=None)

Issue Severity

Low: It annoys or frustrates me.