pythonicrubyist / creek

Ruby library for parsing large Excel files.

Home Page:http://rubygems.org/gems/creek

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text read from excel files is html escaped

davich opened this issue · comments

I have an xlsx file with an ampersand in it (&) and when I read the file using Creek, it becomes &. Is this kind of html escaping expected? Is there a way of turning it off? Thanks

testfile.xlsx

book = Creek::Book.new file_path
book.sheets[0].rows.each do |row|
    p row.values
end

@davich

Hi, I have given the value a&b in a cell and I can read as it is. I am not facing this problem. Can you check it now?

Have you tried it with the file I attached? Is there some formatting on that cell that makes it read differently?

@davich

yes, I tried It reads perfectly , this is the code I have written

require 'creek'
@SanityWorkBook = Creek::Book.new "testfile.xlsx"
@sheet = @SanityWorkBook.sheets.detect {|sheet, index| sheet.name.eql? "Inspirations"}
@Header, *@datasets = @sheet.rows.to_enum.map(&:values)

puts @Header
puts @datasets

This is the output I get

abc
def
ghi
j&k

Thanks for helping me debug this issue. I run exactly the same code on my machine and get the ampersand escaped.
creek-2.4.1 on OSX

$ cat xlsx.rb 
require 'creek'
@SanityWorkBook = Creek::Book.new "testfile.xlsx"
@sheet = @SanityWorkBook.sheets.detect {|sheet, index| sheet.name.eql? "Inspirations"}
@Header, *@datasets = @sheet.rows.to_enum.map(&:values)

puts @Header
puts @datasets

$ ruby xlsx.rb 
abc
def
ghi
j&k

I've created a PR with a couple of tests. One passes, the other fails: #75
It seems to be to do with the working one using shared strings (j&k) and the broken one using inlineStrings (j&k)

I fixed the issue with #76
Using

node.read
node.value

instead of

node.inner_xml

Can you please confirm this is the right approach? Thanks

Hi @Rajagopalan-M , Just following up on this. How should we proceed? Thanks

I am able to read, I don't have any problem.

Hi @Rajagopalan-M
I have a PR with a failing test in #75 - This reproduces the issue.
Can you please look at it?
Thanks

Can you merge #76 ?

Merged. Thanks for your contribution.