google / weather-tools

Tools to make weather data accessible and useful.

Home Page:https://weather-tools.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IO backends error

mt467 opened this issue · comments

Ran the below command of weather-mv to import a batch of grib bz2 files into BQ.

weather-mv --uris "gs://xxxxx/A4D0113*.bz2" --output_table xxxx.xxxxx.ecmwf_realtime_test_a4 --runner DataflowRunner --project xxxxxx --temp_location gs://xxxxx/tmp --job_name ecmwf_xxxxx_test_a4

Traceback (most recent call last): File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 752, in apache_beam.runners.common.PerWindowInvoker.invoke_process File "apache_beam/runners/common.py", line 870, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File "apache_beam/runners/common.py", line 1368, in apache_beam.runners.common._OutputProcessor.process_outputs File "/usr/local/lib/python3.7/site-packages/loader_pipeline/pipeline.py", line 238, in extract_rows with open_dataset(uri) as ds: File "/usr/local/lib/python3.7/contextlib.py", line 112, in enter return next(self.gen) File "/usr/local/lib/python3.7/site-packages/loader_pipeline/pipeline.py", line 94, in open_dataset xr_dataset: xr.Dataset = __open_dataset_file(dest_file.name) File "/usr/local/lib/python3.7/site-packages/loader_pipeline/pipeline.py", line 62, in __open_dataset_file return xr.open_dataset(filename) File "/usr/local/lib/python3.7/site-packages/xarray/backends/api.py", line 479, in open_dataset engine = plugins.guess_engine(filename_or_obj) File "/usr/local/lib/python3.7/site-packages/xarray/backends/plugins.py", line 155, in guess_engine raise ValueError(error_msg) ValueError: did not find a match in any of xarray's currently installed IO backends ['scipy']. Consider explicitly selecting one of the installed engines via the engine parameter, or installing additional IO dependencies, see: http://xarray.pydata.org/en/stable/getting-started-guide/installing.html http://xarray.pydata.org/en/stable/user-guide/io.html During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 651, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute op.start() File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start File "apache_beam/runners/worker/operations.py", line 353, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 1315, in apache_beam.runners.common.DoFnRunner._reraise_augmented File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 752, in apache_beam.runners.common.PerWindowInvoker.invoke_process File "apache_beam/runners/common.py", line 870, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File "apache_beam/runners/common.py", line 1368, in apache_beam.runners.common._OutputProcessor.process_outputs File "/usr/local/lib/python3.7/site-packages/loader_pipeline/pipeline.py", line 238, in extract_rows with open_dataset(uri) as ds: File "/usr/local/lib/python3.7/contextlib.py", line 112, in enter return next(self.gen) File "/usr/local/lib/python3.7/site-packages/loader_pipeline/pipeline.py", line 94, in open_dataset xr_dataset: xr.Dataset = __open_dataset_file(dest_file.name) File "/usr/local/lib/python3.7/site-packages/loader_pipeline/pipeline.py", line 62, in __open_dataset_file return xr.open_dataset(filename) File "/usr/local/lib/python3.7/site-packages/xarray/backends/api.py", line 479, in open_dataset engine = plugins.guess_engine(filename_or_obj) File "/usr/local/lib/python3.7/site-packages/xarray/backends/plugins.py", line 155, in guess_engine raise ValueError(error_msg) ValueError: did not find a match in any of xarray's currently installed IO backends ['scipy']. Consider explicitly selecting one of the installed engines via the engine parameter, or installing additional IO dependencies, see: http://xarray.pydata.org/en/stable/getting-started-guide/installing.html http://xarray.pydata.org/en/stable/user-guide/io.html [while running 'ExtractRows']

The issue is that cfgrib is not installed on the Dataflow worker, so the library is not available to XArray for reading.
There are a couple ways we can address this:

  1. Experiment with the setup.py script for the weather-mover in order to ensure that cfgrib is installed on the worker machine. We've had limited success with this in the past, since environments are hard to manage with setuptools alone (although, it is possible). This is a preferable solution since it makes weather-mv very portable.
  2. Provide a docker image with the necessary dependencies pre-installed, document instructions to make use of this image for Dataflow jobs that require cfgrib. The way we've address this issue in the past. We have a Dockerfile available (with instructions on how to build it in GCP). Ideally, we can publish this image on Dockerhub for quicker access.