ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Home Page:https://ray.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Core] Un-Deprecate Dynamic Generators

npilon opened this issue · comments

Description

num_returns='dynamic' generators were deprecated abruptly in ray 2.9. These generators are useful for implementing tasks (or actor methods) that perform a series of operations, which can be expressed as a generator while still having the same semantics as ordinary ray tasks, such as not completing until the actor is free for its next task.

Use case

Our ray-based data pipeline has made heavy use of dynamic generator tasks to express operations that produce multiple results. Often they will apply a series of strategies, using python's yield from syntax to concisely produce results and remaining work in a single statement:

remaining_work = yield from primary_strategy(original_work)
still_remaining = yield from secondary_strategy(remaining_work)
# etc

These tasks are usually interacting with unreliable external services that loosely but aggressively rate-limit access, and require managing state such as authentication sessions and database connections. We manage these constraints using ActorPool, plus helpers for dealing with errors. Dynamic generator tasks are ideal for our use case because:

  • They have the same API semantics as ordinary tasks - their result handle is only ready once the actor is free. (Streaming generators can do this, but require a different API, violating duck typing principles and significantly complicating maintenance)
  • They clearly and consistently report errors and provide the completed results.
  • They enable us to use python's rich generator syntax.

Hi @npilon

are you able to use the new stream generator: https://docs.ray.io/en/latest/ray-core/ray-generator.html

Hi @npilon

are you able to use the new stream generator: https://docs.ray.io/en/latest/ray-core/ray-generator.html

Streaming generators do not meet important parts of my use case: they do not have the same API semantics as ordinary tasks. This requires all general purpose intermediate infrastructure to detect and appropriately select which API to use. This is a substantial additional maintenance burden.