google / orbax

Orbax provides common utility libraries for JAX users.

Home Page:https://orbax.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tensorstore spec configuration

minotru opened this issue · comments

Hi Orbax community,

Under the hood, Orbax uses TensorStore for tensor IO, TensorStore integration is a part of type_handlers.py.

TensorStore comes with KVStore implementations for File, GCS, S3, GRPC, etc.

Unfortunately, TensorStore integration part is quite rigid and does not support any form of extension from client code.
Namely, it only supports 'gcs' and 'file' kvstores, their spec is hard-coded and can't be configured.

I guess Orbax should provide a way to configure custom kvstore spec based on directory. I.e. make typed_handlers.py flexible and extendable and avoid hard-coded if/else. So that, as a user, I can support alternative "directory -> kvstore spec" mapping when needed and patch other parts of tsspec too.

For instance, there could be a class like:

class TsSpecStrategyBase(ABC):
    def supported_for(self, directory: str) -> bool:
        ...

    def get_spec(self, directory: str) -> dict:
        ...

class FileTsSpecStrategy(TsSpecStrategyBase):
    ...


class GrpcTsSpecStrategy(TsSpecStrategyBase):
    ...

class TsSpecStrategyResolver:
    def register_strategy(self, strategy: TsSpecStrategyBase):
        ...

    def resolve(self, directory: str) -> TsSpecStrategyBase:
        ...

Motivation. My colleagues have implemented tsgrpc-compatible storage that we want to use as a checkpoint storage. Unfortunately, we can't use it without custom patches to orbax code. Namely:

  • we've added another if/else to _get_tensorstore_spec to activate grpc driver for paths like yt://*.
  • supported patches via env variables to tspec['metadata'], tspec['kvstore']['config'], spec['kvstore'] (ocdbt) in order to configure experimental_read_coalescing_interval and disable compression.

I will be happy to assist and submit a PR

Regards,
Simon

Hi Simon,

This seems quite reasonable to me. We don't have a mechanism set up for mirroring code from external to internal, but probably an external contributor could just submit a PR and then we can do a manual merge internally.

My thought is that we could have something like:


class TensorStoreSpecStrategy(Protocol):

  def read_spec(self, info: ParamInfo, args: RestoreArgs) -> ts.Spec:
    ...

  def write_spec(self, value: Any, info: ParamInfo, args: SaveArgs) -> ts.Spec:
    ...
  
class ArrayHandler(TypeHandler):

  def __init__(self, ..., tensorstore_strategy: TensorStoreStrategy):
    self._tensorstore_strategy = tensorstore_strategy

  async def serialize(self, ...):
    for value in values:
      spec = self._tensorstore_strategy.write_spec()
      commit_future = await serialization.async_serialize(value, spec)
   return commit_futures

  async def deserialize(self, ...):
    ...

This would be very customizable, though you might have to write a bit more code that you otherwise would. However I think TensorStoreStrategy could also factor out methods like metadata to set the metadata field in the ts.Spec, and a common_spec method that can be delegated to by read_spec and write_spec.

Hi Colin,

Great, I will be happy to contribute.

Thanks, I like the provided code sample, it looks very configurable.

Let me clarify one thing.

I assumed that there should be strategies like TensorStoreFileStrategy, TensorStoreGCSStrategy, MyTensorStoreStrategy... And then there is TensorStoreStrategyResolver that should return a strategy instance based on the given directory (paramInfo). And this TensorStoreStrategyResolver should be passed to ArrayHandler constructor.

While in your code sample ArrayHandler accepts tensorstore_strategy itself, not some resolver. From your point of view, should there be a resolver instead? Or, maybe, there can be a special type of strategy like TensorStoreRouterStrategy that would serve as a resolver?

Hi @minotru, we will finalize the design after an internal meeting and come up a design by end of week. Then you can help implement! Thanks in advance!

@minotru I would like to give you a quick update. We have made decision to go with @cpgaffney1 's design. However, we will implement the Resolver because TensorStoreSpecStrategy can already achieve the customization. The custom TensorStoreSpecStrategy can also stay with client codes, so no integration with Orbax is needed.

Hi @ChromeHearts

Could you please clarify? Seems like "not" is missing in the sentence:

... will (not?) implement the Resolver because TensorStoreSpecStrategy can already achieve ...

Overall, there are some blank spaces in the design that I am not sure about.

I guess an extended version of @cpgaffney1's code sample would be very helpful for me to understand the design better.

Could you please illustrate your vision for:

  1. built-in TensorStoreSpecStrategy initilization, how it is registered with standard type handler instances
  2. how built-in strategy should look like to support both FS and GCS and stay neat
  3. how extension mechanism should look like from user code, i.e. support S3 but keep FS and GCS

Hi @ChromeHearts , @cpgaffney1, could you please circle back here?

Sorry for the late reply!

Yes, Orbax team will not implement the Resolver.

To answer your questions

  1. The current TensorStoreSpec logic will be factored into a TensorStoreSpecStrategy which will be used by type handlers by default (eg. no custom strategy is supplied)
  2. Simple file system and GCS are already supported by the default strategy. If you need support for custom GRPC, you can use a type_handler with custom strategy. If you need all to be supported at the same strategy, you can create a custom strategy that based on the default one with additional GRPC supports.
  3. The default strategy already support FS, GCS & S3 by depending on the prefixes such as /somepath, gs:// & s3://.

Let me know if you have any other question! Thanks!