UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

Home Page:https://github.com/UglyToad/PdfPig/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Infinite loop in GlyphDataTable.ReadCompositeGlyph()

sascha-schwegelbauer opened this issue · comments

I have a PDF-File (unfortunately not available for public) that crashes during ContentOrderTextExtractor.GetText().
After a basic analysis, I can say that the do-while loop in GlyphDataTable.ReadCompositeGlyph() never exits.
It tries to add the same glyph (same index, same values in transform matrix) to the list of CompositeComponents again and again.
Adding a conditional break (checking for existence of a component in the list with the same index and same values in matrix) exits the loop but leads to a IndexOutOfRangeException later in GlyphDataTable.ReadFlags().
Unfortunately, I have no good idea how to fix this issue - so this is just FYI.