jsonkenl / xlsxir

Xlsx parser for the Elixir language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Order of rows does not match the source file

pma opened this issue · comments

At least in the .xlsx I used, the line number is represented as a string in the ets table, so Enum.sort will not work as expected: https://github.com/kennellroxco/xlsxir/blob/master/lib/xlsxir.ex#L125.

I handled this by not using get_lines and accessing the ets table directly. I need the row number to display to the user on any validation error, but I don't necessarily need to process the rows in the order of the source file.

I'm also thinking that the use case of iterating the rows could be implemented considerably more efficiently if it was a special case, avoiding a lot of intermediate representations and transformations.

Thanks Paulo. There was a bug...I intended for rows to be represented as integers in the ETS table but they were in fact strings. I've updated that issue and pushed to the repo, so please re-pull and let me know if it resolved the issue you had.

Do you have any specific suggestions with regard to how I could make the row iterations more efficient? I'm currently looking at ways to implement GenStage.Flow to improve performance.

@kennellroxco Thanks! It solved the issue and get_list now returns the rows in the original order.

I may be a bit biased because I'm currently only needing get_list. But it seems the steps parse xml rows -> load into ets kv -> iterate ets and transform to list -> sort list -> map list do a lot of transformations and build intermediate data structures, when the output of the SAX parser could be the final ordered list, already with the final mapping.

But comparing the times of extract and get_list for a 55K row file, it's actually 30s vs 2s. So it seems it's not worth optimizing the get_list use case, unless the actual parsing could run faster for some reason.

Well I'm hoping to optimize the parsing with GenStage.Flow. I'll let you know when I figure something out so you can give it a try and let me know how it goes. Closing this issue for now. Thanks!