Can I have a defined plot for a transform dataset?
kthyng opened this issue · comments
I'm defining a catalog in which there is a "base" csv data (ctd_base
below) and a dataset that is a transformed/derived version of ctd_base
called ctd
, which does some processing to produce a more usable dataset. I would like to have a plot available for source ctd
— is there a way to do this? Below I'm showing a version of what I've tried but haven't been able to get it to work when I call cat.ctd.plot.example()
with the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[18], line 1
----> 1 cat["ctd"].plot.example()
File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/hvplot/plotting/core.py:92, in hvPlotBase.__call__(self, x, y, kind, **kwds)
89 plot = self._get_converter(x, y, kind, **kwds)(kind, x, y)
90 return pn.panel(plot, **panel_dict)
---> 92 return self._get_converter(x, y, kind, **kwds)(kind, x, y)
File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/hvplot/plotting/core.py:99, in hvPlotBase._get_converter(self, x, y, kind, **kwds)
97 y = y or params.pop("y", None)
98 kind = kind or params.pop("kind", None)
---> 99 return HoloViewsConverter(self._data, x, y, kind=kind, **params)
File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/hvplot/converter.py:389, in HoloViewsConverter.__init__(self, data, x, y, kind, by, use_index, group_label, value_label, backlog, persist, use_dask, crs, fields, groupby, dynamic, grid, legend, rot, title, xlim, ylim, clim, symmetric, logx, logy, loglog, hover, subplots, label, invert, stacked, colorbar, datashade, rasterize, row, col, debug, framewise, aggregator, projection, global_extent, geo, precompute, flip_xaxis, flip_yaxis, dynspread, hover_cols, x_sampling, y_sampling, project, tools, attr_labels, coastline, tiles, sort_date, check_symmetric_max, transforms, stream, cnorm, features, rescale_discrete_levels, **kwds)
387 self.value_label = value_label
388 self.label = label
--> 389 self._process_data(
390 kind, data, x, y, by, groupby, row, col, use_dask,
391 persist, backlog, label, group_label, value_label,
392 hover_cols, attr_labels, transforms, stream, kwds
393 )
395 self.dynamic = dynamic
396 self.geo = any([geo, crs, global_extent, projection, project, coastline, features])
File ~/miniconda3/envs/ciofs/lib/python3.11/site-packages/hvplot/converter.py:800, in HoloViewsConverter._process_data(self, kind, data, x, y, by, groupby, row, col, use_dask, persist, backlog, label, group_label, value_label, hover_cols, attr_labels, transforms, stream, kwds)
798 self.data = data
799 else:
--> 800 raise ValueError('Supplied data type %s not understood' % type(data).__name__)
802 if stream is not None:
803 if streaming:
ValueError: Supplied data type DataFrameTransform not understood
CATALOG:
name: ctd
description: CTD
sources:
ctd_base:
description: Base
driver: csv
args:
urlpath: /Users/kthyng/projects/ciofs-hindcast-report/ciofs_hindcast_report/inputs/data/CTD_KBNERR_301933/301933.csv
ctd:
description: CTD
driver: process.DataFrameTransform
args:
targets:
- ctd_base
transform: "process.ctd"
transform_kwargs:
station: kacbcwq
metadata:
plots:
example:
kind: line
x: DateTimeStamp
y: Temp
width: 800
height: 600
Also process.DataFrameTransform
is the same as what is provided in intake
, and process.ctd
runs some stuff on the DataFrame.
Thanks for any help!
Is the output of source "ctd" (the result of .read() ) also a dataframe?
I tried the following
Catalog
name: ctd
description: CTD
sources:
ctd_base:
description: Base
driver: csv
args:
urlpath: data.csv
ctd:
description: CTD
driver: intake.source.derived.DataFrameTransform
args:
targets:
- ctd_base
transform: "toolz.identity"
transform_kwargs: {}
metadata:
plots:
example:
kind: line
data.csv
a,b
0,1
0.1, 1.1
and cat.ctd.plot()
or cat.ctd.plot.example()
both did run successfully.
Thank you @martindurant! I realize now that my problem is actually that the catalog isn't finding the transform after I recently rearranged the directory structure, not that the source isn't able to understand the plot. I'm not able to get it to recognize my version of DataFrameTransform
which has one difference (I have the dask dataframe compute earlier) but it is something to do with my set up since it used to work before I reorganized.
@martindurant Ok I see how I became confused: the catalog entry ctd
works with my slightly-changed version of DataFrameTransform
when I'm just accessing the data with read
. However, when I add a plot into the metadata and then try to plot, it is hvplot
that cannot find my version of DataFrameTransform
with that error:
798 self.data = data
799 else:
--> 800 raise ValueError('Supplied data type %s not understood' % type(data).__name__)
802 if stream is not None:
803 if streaming:
ValueError: Supplied data type DataFrameTransform not understood
Does hvplot have different rules for how it looks for inputs to catalog entries? The plot works when I use driver: intake.source.derived.DataFrameTransform
but not when I use the location of my own DataFrameTransform
even though it works with .read()
.
I'm not exactly sure how hvplot determined the data type, but you should ensure that your class is a subclass of at least intake.source.DataSource . Since I didn't get your exception, it's tricky for me to say what might be going on. You might want to enter debug and find out the valur of data
when passing a standard DataFrameTransform versus your version.
Thank you for the suggestion. I dug into the relevant code in hvplot
and the problem is earlier: the incoming data is not being identified as being an intake
source because the start of the transform name doesn't start with "intake".
I'm going to see if I can just use the built-in DataFrameTransform
to avoid this issue.
Hm, it's going to for sure be a problem as soon as I want to use my DatasetTransform
which will be soon. Dang. I guess I'll need to go post at hvplot
.
isinstance(data, DataSource)
is the actual check. Does that fail for your class? The "intake" check just looks to see whether the library can be imported - you shouldn't need a specific name.
When using my transform, the code doesn't make it far enough to check isinstance(data, DataSource)
— it returns due to if not check_library(data, 'intake')
. But you make a good point — my transform does pass the actual check of isinstance(data, DataSource)
.
Oh, you are right and I am wrong
def check_library(obj, library):
if not isinstance(library, list):
library = [library]
return any([obj.__module__.split('.')[0].startswith(l) for l in library])
requires the object to have a fully-qualified path in the intake namespace. This is a totally unnecessary requirement! Can you please make an issue with hvplot?
The "intake" check just looks to see whether the library can be imported - you shouldn't need a specific name
any([obj.__module__.split('.')[0].startswith(l) for l in library])
in my case has
ipdb> obj.__module__
'ciofs_hindcast_report.src.process'
and compares that with library
which in this case is "intake":
ipdb> library
['intake']