podofo / podofo

A C++17 PDF manipulation library

Home Page:https://podofo.github.io/podofo/documentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Default example of PdfStreamedDocument throws exception on M1 Mac

Grantim opened this issue · comments

Bug report
Hello! I try to use PoDoFo on mac, and it fails on default test example, I think it is ARM64 Mac specific, but not sure.

* Example of using PdfStreamedDocument:
*
* PdfStreamedDocument document;
* document.Load("outputfile.pdf");
* PdfPage& page = document.GetPages().CreatePage(PdfPage::CreateStandardPageSize(PdfPageSize::A4));
* PdfFont* font = document.GetFonts().SearchFont("Arial");
*
* PdfPainter painter;
* painter.SetCanvas(page);
* painter.TextState.SetFont(*font, 18);
* painter.DrawText("Hello World!", 56.69, page.GetRect().Height - 56.69);
* painter.FinishDrawing();

//PdfStreamedDocument document; // looks like it does not have default constructor
//document.Load("outputfile.pdf");
PdfStreamedDocument document("outputfile.pdf");
PdfPage& page = document.GetPages().CreatePage(PdfPage::CreateStandardPageSize(PdfPageSize::A4));
PdfFont* font = document.GetFonts().SearchFont("Arial");

PdfPainter painter;
painter.SetCanvas(page);
painter.TextState.SetFont(*font, 18);
painter.DrawText("Hello World!", 56.69, page.GetRect().Height - 56.69);
painter.FinishDrawing();

Output:

C++ exception with description "PdfErrorCode::NotImplemented, This feature is currently not implemented.
Callstack:t#0 Error Source: main/PdfStreamedObjectStream.cpp(93), Information: Unsupported reading from streamed object stream" thrown in the test body.

Environment:

  • Version/git revision: 0.10.1
  • Operating System: MacOs arm64
  • Package manager used: brew

Almost 100% sure this is not ARM64 specific. PdfStreamedDocument is one of the features I never used myself, and I think it should be a way to create new documents by writing immediately object streams back to file, so they don't reside in RAM (@domseichter: correct?). Unfortunately it's another feature that was not unit tested so it's been untested for long time. The idea, I believe, is also that streams can't be read back and there should be also a check that once a object is written it becomes immutable but I accidentally removed it (I should restore it in slightly different way). Workaround for now is not to use PdfStreamedDocument but PdfMemDocument instead.

It worked on Windows on older version

Older version means 0.9.x? Based on your report and a brief test, I believe I regressed this functionality in 0.10.x. I'm sorry for that.

0.9.x. I have used PdfMemDocument, it worked for me, thanks

Notes to fix this issue:

  • Remove mutability of PdfObject (remove SetString/SetName/SetReal). Make SetNumber private (used by PdfStreamedDocument functionality);
  • Re-introduce private SetImmutable in PdfObject/PdfDictionary/PdfArray (used by PdfStreamedDocument functionality), use it after writing the object as it was in 0.9.x;
  • Retest under these conditions.

@ceztko is there a place where the streams design is explained? I tried to apply the changes you suggest, but it seems that there are other things that are not in place too, like setBool.

I'm trying to understand the root cause of the problem, how does the fact that the PdfObject is mutable takes the code to getting the unique_ptr<InputStream> PdfStreamedObjectStream::GetInputStream(PdfObject& obj) that is not implemented?

Any additional explanation on this issue is very much welcomed, thanks!

@cosmin42 "stream" is a word abused both in PDF specification and in PoDoFo. For the I/O infrastructure involving "streams" the design was described in this message[1] (see the attachment as well). For PdfStreamedDocument I believe the code ends in GetInputStream because of lazy compression of plain/uncompressed streams, that was introduced only recently[2].

Anyway, the good news is I want to fix this myself now, as I really want to ditch PdfObject mutability.

[1] https://sourceforge.net/p/podofo/mailman/message/37676859/
[2]

// Try to compress the flate compress the stream if it has no filters,

@cosmin42 "stream" is a word abused both in PDF specification and in PoDoFo. For the I/O infrastructure involving "streams" the design was described in this message[1] (see the attachment as well). For PdfStreamedDocument I believe the code ends in GetInputStream because of lazy compression of plain/uncompressed streams, that was introduced only recently[2].

Anyway, the good news is I want to fix this myself now, as I really want to ditch PdfObject mutability.

[1] https://sourceforge.net/p/podofo/mailman/message/37676859/ [2]

// Try to compress the flate compress the stream if it has no filters,

That's great to hear!

Based on the 0.9.x implementation, the mutable property seems to be a runtime sanity check because it is used only in the AssertMutable() function.

Beyond the InputIterator problem there is another one in the PdfImmediateWriter finish method: PODOFO_RAISE_ERROR_INFO(PdfErrorCode::NotImplemented, "FIX-ME: The following is already done by PdfXRef now");

I may save PdfObject mutability, after all. The point of SetImmutable() is block mutability after writing the object in the PdfImmediateWriter code path, which of course is verified with AssertMutable(). I'm doing progresses, but it's not an easy task, especially after many general improvements in PoDoFo and desire to stop coding stuff in a hacky way.

Fixing this is proving to be hard because of a combination of hacks in the previous implementation that are gonna be removed and new facilities of I/O and handling of cross reference table/stream that were introduced and shall be used now. Some bases towards a fix in 73fa692 were pushed but not a final fix yet (few more days needed for that). In the end a unit test like the one in the first post will be enabled, but since I never used PdfStreamedDocument, nor I plan to begin using it, more bugs may present even after that test case is fixed. We'll see.

The code has been extensively cleaned and all hacks are now gone. Streaming operations should work under the condition that dictionaries are finalized before writing the object streams, and no simultaneous streaming operations are requested (if it happens -> exception is throwed). The exact sample of the first post works and I put it in unit testing. If there are further problems open a new issue. Again, please beware that I don't use PdfStreamedDocument, but I didn't want to remove it so I couldn't let it broken.

I just tested the changes and it works great. At this point I should be able to fix eventual bugs. Thanks!