jsonkenl / xlsxir

Xlsx parser for the Elixir language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Complex shared strings throw out indexes

brushbox opened this issue · comments

Problem

I have a workbook that has some colored text. In my app it parses successfully but the field values are incorrect. It is correct in numbers, libre office and using Ruby's "Simple Xlsx Reader" gem.

It comes down to the parsing of the shared strings. In the shared strings I have:

  <si>
    <r>
      <t>Red writing</t>
    </r>
    <r>
      <rPr>
        <sz val="10"/>
        <rFont val="Arial"/>
        <family val="2"/>
      </rPr>
      <t>: Indicates no perament walker, but these Walkers are helping out this week</t>
    </r>
    <phoneticPr fontId="0" type="noConversion"/>
  </si>

In Xlsxir this is turned into two strings "Red writing" followed by ": Indicates no perament walker, but these Walkers are helping out this week". In other parses it is a single string "Red writing: Indicates no perament walker, but these Walkers are helping out this week".

This throws the indexes out by one and causes the wrong values to be used in any cells that use shared strings that follow the above value.

Current Status

I am looking to reduce the problem spreadsheet to the smallest failing example. I will look into getting a test case written and will try and find a fix. I might need some help on sax parser stuff.

Here is a very small file (two cells) that show the issue:

wp2.xlsx

  • A1 should be "RED: BLACK"
  • A2 should be "Data"

Instead we get:

  • A1 => "RED"
  • A2 => ": BLACK"

I'll get the test cases done for this.

@brushbox this shouldn't be too difficult to fix. I have a feeling the issue is resulting from the way I have it set to parse very long strings where the family string attribute comes into play. Thanks for creating a sample workbook. I'll take a look and see what I can come up with.

@jsonkennell hi. Thanks for letting me know. I'm looking into it but have been caught up with a few other things lately.

I'll let you know once I've had time to go over things properly.

Merged you PR so closing this for now.