Text read from excel files is html escaped
davich opened this issue · comments
I have an xlsx file with an ampersand in it (&
) and when I read the file using Creek, it becomes &
. Is this kind of html escaping expected? Is there a way of turning it off? Thanks
book = Creek::Book.new file_path
book.sheets[0].rows.each do |row|
p row.values
end
Hi, I have given the value a&b in a cell and I can read as it is. I am not facing this problem. Can you check it now?
Have you tried it with the file I attached? Is there some formatting on that cell that makes it read differently?
yes, I tried It reads perfectly , this is the code I have written
require 'creek'
@SanityWorkBook = Creek::Book.new "testfile.xlsx"
@sheet = @SanityWorkBook.sheets.detect {|sheet, index| sheet.name.eql? "Inspirations"}
@Header, *@datasets = @sheet.rows.to_enum.map(&:values)
This is the output I get
abc
def
ghi
j&k
Thanks for helping me debug this issue. I run exactly the same code on my machine and get the ampersand escaped.
creek-2.4.1 on OSX
$ cat xlsx.rb
require 'creek'
@SanityWorkBook = Creek::Book.new "testfile.xlsx"
@sheet = @SanityWorkBook.sheets.detect {|sheet, index| sheet.name.eql? "Inspirations"}
@Header, *@datasets = @sheet.rows.to_enum.map(&:values)
puts @Header
puts @datasets
$ ruby xlsx.rb
abc
def
ghi
j&k
I've created a PR with a couple of tests. One passes, the other fails: #75
It seems to be to do with the working one using shared strings (j&k
) and the broken one using inlineStrings (j&k
)
I fixed the issue with #76
Using
node.read
node.value
instead of
node.inner_xml
Can you please confirm this is the right approach? Thanks
Hi @Rajagopalan-M , Just following up on this. How should we proceed? Thanks
I am able to read, I don't have any problem.
Hi @Rajagopalan-M
I have a PR with a failing test in #75 - This reproduces the issue.
Can you please look at it?
Thanks
Can you merge #76 ?
Merged. Thanks for your contribution.