SheetJS / sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

Home Page:https://sheetjs.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sheet_to_json: inconsistent blank cell parsing

gliluaume opened this issue · comments

Hello,

I think there is an issue with sheet_to_json function depending on the file given. In some cases, blank cells are parsed to and empty string in output Javascript object, in other case these cells are undefined (no key) in output object.

I do not set defVal attribute as I want undefined values to stay undefined.

You can see an example in following repo https://github.com/gliluaume/bug-xlsx

  • bug-xlsx.xlsx is supposed to match XLSX format (generated from Google sheet)
  • bug-xlsx-excel5.xls is a Excel 5.0 file

Using node (see index.js) output is consistent (but not as expected):

./bug-xlsx.xlsx
[
  {
    Stuff1: 123,
    Stuff3: 456,
    Stuff4: 'AED',
    Stuff5: 'BAR',
    Stuff6: 'x',
    Stuff7: '',
    Stuff8: '',
    Stuff9: 'A'
  },
  {
    Stuff1: 123,
    Stuff3: 456,
    Stuff4: 'AED',
    Stuff5: 'BUS',
    Stuff6: 'x',
    Stuff7: '',
    Stuff8: '',
    Stuff9: 'A'
  },
  {
    Stuff1: 123,
    Stuff3: 456,
    Stuff4: 'AED',
    Stuff5: 'APPL',
    Stuff6: 'x',
    Stuff7: '',
    Stuff8: '',
    Stuff9: 'A'
  }
]
./bug-xlsx-excel5.xls
[
  {
    Stuff1: 123,
    Stuff3: 456,
    Stuff4: 'AED',
    Stuff5: 'BAR',
    Stuff6: 'x',
    Stuff9: 'A'
  },
  {
    Stuff1: 123,
    Stuff3: 456,
    Stuff4: 'AED',
    Stuff5: 'BUS',
    Stuff6: 'x',
    Stuff9: 'A'
  },
  {
    Stuff1: 123,
    Stuff3: 456,
    Stuff4: 'AED',
    Stuff5: 'APPL',
    Stuff6: 'x'
  }
]

Using the library in a browser (firefox 104.0.2 (64 bits) on Windows 10), I can see the same behaviour.

The files themselves have different contents. You can inspect with the parser:

% for F in ./bug-xlsx*; do echo $F; node -pe 'var wb = require("xlsx").readFile("'$F'"); wb.Sheets[wb.SheetNames[0]].G2'; done
./bug-xlsx-excel5.xls
undefined
./bug-xlsx.xlsx
{ t: 's', v: '', r: '<t></t>', h: '', w: '' }

The XLSX file literally has blank strings:

<!-- xl/worksheets/sheet1.xml -->
<c r="G2" s="4" t="s"><v>11</v></c>
<!-- xl/sharedStrings.xml -- you have to count starting from index 0 -->
<si><t></t></si>

Google Sheets must be translating the unspecified cells to blank string cells. You can always programmatically delete those fields from the output of sheet_to_json:

const newobj = obj.map(r => Object.fromEntries(Object.entries(r).filter(([k,v]) => v !== "")));

Hello,
Thank you for your quick response!
Yes, I handle it by removing properties with empty string value, for now.