workspace.download_file - not downloading transitive files

Question

workspace.download_file - not downloading transitive files

kba opened this issue 10 months ago · comments

Konstantin Baierer commented 10 months ago

Noticed while fixing the broken tests in OCR-D/ocrd_kraken#42:

Here, we use Resolver.workspace_from_url without download, which copies the mets.xml and nothing else.

@pytest.fixture()                                             
def workspace(tmpdir):                                        
    if os.path.exists(tmpdir):                                
        shutil.rmtree(tmpdir)                                 
    workspace = Resolver().workspace_from_url(                
        assets.path_to('kant_aufklaerung_1784/data/mets.xml'),
        dst_dir=tmpdir
    )                                                         
    return workspace

In the processors, the PAGE-XML is downloaded via

pcgts = page_from_file(self.workspace.download_file(input_file)) 
image_url = pcgts.get_Page().imageFilename                       
# [...]
    image = self.workspace.resolve_image_as_pil(image_url)

This is apparently broken because the image file is not downloaded and tests fail.

So either I debug this properly to find out why the baseurl mechanism does not work here or we finally get rid of the long-deprecated resolve_image_as_pil altogether.

Robert Sachunsky · Answer 1 · Fri Aug 02 2024 05:14:22 GMT+0800 (China Standard Time)

Here, we use Resolver.workspace_from_url without download, which copies the mets.xml and nothing else.

Like I ~~already~~ (later) said in #1149, cloning from local workspaces is still fundamentally broken.

In the processors, the PAGE-XML is downloaded via
This is apparently broken because the image file is not downloaded and tests fail.

Like I already said in #809, the download changes the relative local path that the PAGE files might expect.

or we finally get rid of the long-deprecated resolve_image_as_pil altogether.

I cannot see anything wrong with that function itself.