New pickling methods loads all absolute links
emmanuelmathot opened this issue · comments
Since the PR #1285 and more specifically this change, it seems that a deepcopy of an item will load all the links with an absolute href.
Is it intended or am I missing something?
This causes issue when loading assets using get_assets
method that makes first a deep copy of the stac object that uses pickling.
When I load an item with unreachable links (e.g. s3 url but no custom IO reader set) and try to list the assets, it raises an issue.
self.assets = list(
rio_tiler/io/stac.py:149: in _get_assets
for asset, asset_info in stac_item.get_assets().items():
venv/lib/python3.11/site-packages/pystac/asset.py:300: in get_assets
return {
venv/lib/python3.11/site-packages/pystac/asset.py:301: in <dictcomp>
k: deepcopy(v)
/usr/lib/python3.11/copy.py:172: in deepcopy
y = _reconstruct(x, memo, *rv)
/usr/lib/python3.11/copy.py:271: in _reconstruct
state = deepcopy(state, memo)
/usr/lib/python3.11/copy.py:146: in deepcopy
y = copier(x, memo)
/usr/lib/python3.11/copy.py:231: in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
/usr/lib/python3.11/copy.py:161: in deepcopy
rv = reductor(4)
venv/lib/python3.11/site-packages/pystac/item.py:179: in __getstate__
d["links"] = [
venv/lib/python3.11/site-packages/pystac/item.py:180: in <listcomp>
link.to_dict() if link.get_href() else link for link in d["links"]
venv/lib/python3.11/site-packages/pystac/link.py:181: in get_href
and self.owner.get_root()
venv/lib/python3.11/site-packages/pystac/stac_object.py:326: in get_root
root_link.resolve_stac_object()
venv/lib/python3.11/site-packages/pystac/link.py:330: in resolve_stac_object
obj = stac_io.read_stac_object(target_href, root=root)
venv/lib/python3.11/site-packages/pystac/stac_io.py:234: in read_stac_object
d = self.read_json(source, *args, **kwargs)
venv/lib/python3.11/site-packages/pystac/stac_io.py:205: in read_json
txt = self.read_text(source, *args, **kwargs)
venv/lib/python3.11/site-packages/pystac/stac_io.py:282: in read_text
return self.read_text_from_href(href)
venv/lib/python3.11/site-packages/pystac/stac_io.py:300: in read_text_from_href
with urlopen(req) as f:
/usr/lib/python3.11/urllib/request.py:216: in urlopen
return opener.open(url, data, timeout)
/usr/lib/python3.11/urllib/request.py:519: in open
response = self._open(req, data)
/usr/lib/python3.11/urllib/request.py:541: in _open
return self._call_chain(self.handle_open, 'unknown',
/usr/lib/python3.11/urllib/request.py:496: in _call_chain
result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <urllib.request.UnknownHandler object at 0x74a76915ba50>
req = <urllib.request.Request object at 0x74a768cd8dd0>
def unknown_open(self, req):
type = req.type
> raise URLError('unknown url type: %s' % type)
E urllib.error.URLError: <urlopen error unknown url type: s3>
It looks to me like we've gotten bit by get_link()
's default to transform_hrefs=True
again (for previous art, see #960). I'll open a PR with a fix.
I'm not sure this is true, see follow-on comment for more info.
@emmanuelmathot can you provide a minimum-reproducible example so I can be sure I'm testing against the same problem? I was not able to reproduce the behavior you described with this test:
def test_non_existent_link_during_deepcopy(item: Item) -> None:
item.add_link(pystac.Link("non-existent-asset", "../not-a-dir/not-a-file"))
item = copy.deepcopy(item)
assert item.get_single_link("non-existent-asset").href == "../not-a-dir/not-a-file"
sure, please find the test in this branch: https://github.com/emmanuelmathot/pystac/blob/pickle/tests/test_item.py#L686
@emmanuelmathot do you have an example that includes creating that test file? I'd like to be able to dig into the process that's actually doing the href modifications.
No I do not but a very simple item with one link with absolute s3 href makes the error.
This is really similar to what you mentioned here
It looks to me like we've gotten bit by
get_link()
's default totransform_hrefs=True
again (for previous art, see #960).
when I put transform_href=False
in the __getstate__
method
d["links"] = [
link.to_dict(transform_href=False) if link.get_href(transform_href=False) else link for link in d["links"]
]
There is no more error
@emmanuelmathot got it, thanks. Fix in #1337 which we'll release after merging.