UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

Home Page:https://github.com/UglyToad/PdfPig/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow reading orders dectors to support any class that has a bounding box/PdfRectangle

davebrokit opened this issue · comments

Currently the interface IReadingOrderDetector relies on TextBlock as a parameter. This limits it's use to the TextBlock class.

I propose adding an IBoundingBox interface

public interface IBoundingBox
{
    PdfRectangle BoundingBox { get; }
}

Then changing IReadingOrderDector interface and implementing classes to use IBoundingBox as it's parameter

Adding an overload that takes a Func<T, PdfRectangle> would allow the caller to specify any bounding box making the interface more useful.

Breaking changes: The IReadingOrderDector will instead return an IReadOnlyList<T> which will be the ordered results. This would mean TextBlock.ReadingOrder is not set which is a breaking change. But some code can be added that if type T is TextBlock then ReadingOrder is set

Happy to make the changes

commented

@davebrokit I was thinking of doing similar, please go ahead and implement your idea.

I did a similar interface for my project https://github.com/BobLd/Caly/blob/master/Caly.Pdf/Models/IPdfTextElement.cs feel free to reuse that or not.

I think the Letter class has a method instead of a property to get the bounding box. Might be a good opportunity to change that too (in my mind, the letters, text lines and text block should implement your interface, but please let me know what you think)