hyperspy / rosettasciio

Python library for reading and writing scientific data format

Home Page:https://hyperspy.org/rosettasciio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loading/Saving Directly from GPU

CSSFrancis opened this issue · comments

Describe the functionality you would like to see.

I'm not sure what would be required to do to get this to work with h5py but zarr and Nvidia have a nice little library kvikio which can be used to load files directly from zarr into a GPU. The major benefit of this is that you can reduce the overhead of loading data using a CPU and then transferring to a GPU.

Describe the context

I see this being used in two places:

  • Writing directly to zarr object from a GPU which might be much more efficient
  • Fast pipelines for data processing.

Additional information

In general I think that GPU processing will only continue to improve a start to outperform CPU processing. It might be good to get an idea of how difficult it is to do something like this.

https://xarray.dev/blog/xarray-kvikio

I haven't tried it but I suspect that it would be possible to simply specify the zarr store when reading the data, similarly as:
https://github.com/rapidsai/kvikio/blob/f8f58581224082cbec98fb00a7b224abe98d3381/python/kvikio/zarr.py#L403-L411

From the rosettasciio documentation, it should be possible to specify the zarr store when loading the file:
https://hyperspy.org/rosettasciio/user_guide/supported_formats/zspy.html#zspy-hyperspy-s-zarr-specification

Maybe what is needed is only to document how to use rosettasciio with kvikio?

@ericpre I think that is the case as well. We have one computer in our lab which I should be able to get this to work on. The documentation for kvikio leaves a bit to be desired and installing it required a couple of specific cuda etc. versions.

Next time I have a microscope session I'll try to see if I can get this to work and we might be able to just add this to the documentation!