[Ray Data] map_batches treats num_gpus=0 as specifying a workload to run on a GPU
marwan116 opened this issue · comments
Marwan Sarieddine commented
What happened + What you expected to happen
Explicitly setting num_gpus=0 inside a map_batches call is being treated as if I am specifying a workload to be run on GPUs - which is not intuitive. The current implementation seems to only be treating num_gpus is None as not specifying a GPU workload
Versions / Dependencies
Ray 2.12.0
Reproduction script
import ray.data
ds = ray.data.from_items([1, 2, 3, 4, 5])
def identity(batch):
return batch
ds.map_batches(identity, num_gpus=0)
this will generate this error:
ValueError: `batch_size` must be provided to `map_batches` when requesting GPUs. The optimal batch size depends on the model, data, and GPU used. It is recommended to use the largest batch size that doesn't result in your GPU device running out of memory. You can view the GPU memory usage via the Ray dashboard.
but if you change it to the following, it won't
ds.map_batches(identity, num_gpus=None)
Issue Severity
Low: It annoys or frustrates me.