OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D

Home Page:https://ocr-d.de/core/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OcrdMets.get_physical_pages regression: assertion on non-empty result

bertsky opened this issue · comments

This first surfaced via ocrd-import, which for each file determines if adding it would create a clash for that pageId. In doing so, it usually expects an empty result for commands like ocrd workspace find -i some-new-id -k local_filename -k pageId.

Demonstrably, this now fails at the following assertion:

Traceback (most recent call last):
  File "/data/ocr-d/ocrd_all/venv38/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/click/decorators.py", line 92, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/data/ocr-d/ocrd_all/core/src/ocrd/cli/workspace.py", line 484, in workspace_find
    pages = workspace.mets.get_physical_pages(for_fileIds=fileIds)
  File "/data/ocr-d/ocrd_all/core/src/ocrd_models/ocrd_mets.py", line 696, in get_physical_pages
    assert for_fileIds # at this point we know for_fileIds is set, assert to convince pyright
AssertionError

So it seems, the assertion is not warranted, and pyright was in fact right not to be convinced :)