Making saving catalog more robust
kthyng opened this issue · comments
Hi! Right now saving a catalog relies on serialize
in base.py which relies on _captured_init_kwargs
. This has worked okay for me in the past but for some reason despite it seeming like the same situation, I cannot save a new batch of catalogs because all the important information is stored in _captured_init_args
instead of _captured_init_kwargs
. I am not sure what is controlling that, despite looking through many files.
Lines 276 to 311 in 4760112
But, perhaps a better approach would be to make the serialize function more robust and allow for the information being in either place? (Or, in individual saved attributes instead of "captured"?) I am not sure what a good suggestion is for this, I can only think of a simple combination of the two at this point. What do you think? I am not sure what all the related issues are.
Thanks.
Actually, there's no real reason we can't support ordered args as well as kwargs. The YAML stub for a source has "args" which is a key-value map treated like a kwargs dict. One of its entries could be a special key that gets turned into *args.
This would take some development. It may make more sense to find the reason that your particular source is special.
The catalog I've been making is from intake-erddap: https://github.com/axiom-data-science/intake-erddap. I can't figure out what is different about it that makes it so I can't save it! I can add a MWE tomorrow.
I am assuming you have some sources, then, like https://github.com/axiom-data-science/intake-erddap/blob/main/intake_erddap/erddap.py#L43 .
This has some positional arguments (dataset_id, protocol) which are presumably generated by the parent catalog instance. I would suggest that they should be passed as dataset_id=, protocol= kwargs, and then you will have no problems. You could always edit the captured args, I suppose, to make sure this is the case before serilalising (or that library could provide its own serialisation).
Ah, the ERDDAP catalog currently builds catalog entries using LocalCatalogEntry
but then the inputs don't end up in _captured_init_kwargs
. I am playing around with it and I see if I instead build up the entries using TableDAPSource
from erddap.py then I am able to get the inputs as keyword arguments like you suggest and they are then present in _captured_init_kwargs
so that the catalog can be saved.
Is it incorrect to use LocalCatalogEntry
instead of making our own entries with TableDAPSource
?
Using the entries if fine, and usually expected. A catalog always has entries of some sort, that resolve to source instances only on access. Normally, all the kwargs you are after are in the ._open_args
attribute of a LocalCatalogEntry. As far as I can tell, the erddap cat does make a normal kwargs dict here, so I really can't tell where the *args you are struggling with are coming from. Could it be because some attributes of the entry are assigned after creation in that same code block?
@lukecampbell has been working on this and indeed all he had to change was to keyword arguments for it to work:
I'll close this now. Thank you!