ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Home Page:https://ray.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Ray Data] map_batches treats num_gpus=0 as specifying a workload to run on a GPU

marwan116 opened this issue · comments

What happened + What you expected to happen

Explicitly setting num_gpus=0 inside a map_batches call is being treated as if I am specifying a workload to be run on GPUs - which is not intuitive. The current implementation seems to only be treating num_gpus is None as not specifying a GPU workload

Versions / Dependencies

Ray 2.12.0

Reproduction script

import ray.data

ds = ray.data.from_items([1, 2, 3, 4, 5])

def identity(batch):
    return batch

ds.map_batches(identity, num_gpus=0)

this will generate this error:

ValueError: `batch_size` must be provided to `map_batches` when requesting GPUs. The optimal batch size depends on the model, data, and GPU used. It is recommended to use the largest batch size that doesn't result in your GPU device running out of memory. You can view the GPU memory usage via the Ray dashboard.

but if you change it to the following, it won't

ds.map_batches(identity, num_gpus=None)

Issue Severity

Low: It annoys or frustrates me.