TextBlock and inner_tokens
tlienart opened this issue · comments
currently removed this path, before what we were doing is:
- For a text span, determine which tokens are in that span
- Bundle in a
TextBlock
with inner tokens a view of the tokens in the relevant range.
See here in partition:
FranklinParser.jl/src/partition.jl
Lines 46 to 76 in e9d75f9
The relevant function was using findfirst/findlast and it ended up being accumulating a lot
FranklinParser.jl/src/utils/types.jl
Lines 74 to 83 in e9d75f9
However there's no magic, if we have to retokenize every text block on the Franklin side, it does take some time; so ideally we'd do this in a performant way at TextBlock that does not use this findfirst/findlast stuff.
once this is done; need to make Xranklin use it properly; e.g. processing env stuff needs to use the inner tokens instead of just repartitioning.
also potentially item candidates, rows, links, ...
for link stuff, need to review whether the stuff done in Xranklin is not over the top; seems like there's link-type detection again from b.ss
this should not be required.