Skips empty columns to the left

Question

Skips empty columns to the left

Gengisca opened this issue 3 years ago · comments

I've an xlsx
EMPTY_COLUMN.xlsx
with two columns, first is always empty, and the second is not empty.
The response is this one:
0: ['Rome']
1: ['Madrid']
2: ['London']
3: ['Istanbul']
4: ['Paris']
5: ['Amsterdam']

This behavior is incorrect as information from the first column is lost.
I would like to have an answer like this:
0: (2) [null, 'Rome']
1: (2) [null, 'Madrid']
2: (2) [null, 'London']
3: (2) [null, 'Istanbul']
4: (2) [null, 'Paris']
5: (2) [null, 'Amsterdam']

Nikolay · Answer 1 · Wed Nov 17 2021 23:27:34 GMT+0800 (China Standard Time)

See if you're using the latest version.

readXlsxFile() now doesn't skip empty rows or columns: it only skips empty rows or columns at the end, but not in the beginning and not in the middle as it used to.

Gengisca · Answer 2 · Wed Nov 17 2021 23:33:47 GMT+0800 (China Standard Time)

Hello,
I'm using this CDN file:
https://unpkg.com/read-excel-file@5.x/bundle/read-excel-file.min.js

Giancarlo

Nikolay · Answer 3 · Wed Nov 17 2021 23:34:52 GMT+0800 (China Standard Time)

Then provide a demo somewhere on codesandbox.io and an example file

Gengisca · Answer 4 · Wed Nov 17 2021 23:45:43 GMT+0800 (China Standard Time)

You can use
https://catamphetamine.github.io/read-excel-file/
with this file:
EMPTY_COLUMN.xlsx

Nikolay · Answer 5 · Wed Nov 17 2021 23:47:08 GMT+0800 (China Standard Time)

I see.

[
  [
    "Rome"
  ],
  [
    "Madrid"
  ],
  [
    "London"
  ],
  [
    "Istanbul"
  ],
  [
    "Paris"
  ],
  [
    "Amsterdam"
  ]
]

Nikolay · Answer 6 · Thu Nov 18 2021 01:23:48 GMT+0800 (China Standard Time)

I've unzipped the Excel file and checked out the xl/worksheets/sheet1.xml file.

<?xml version="1.0" encoding="UTF-8"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" mc:Ignorable="x14ac xr xr2 xr3" xr:uid="{00000000-0001-0000-0000-000000000000}">
   <dimension ref="B1:B16" />
   <sheetViews>
      <sheetView tabSelected="1" workbookViewId="0" />
   </sheetViews>
   <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25" />
   <cols>
      <col min="1" max="1" width="54.42578125" customWidth="1" />
      <col min="2" max="2" width="42.140625" customWidth="1" />
   </cols>
   <sheetData>
      <row r="1" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B1" t="s">
            <v>0</v>
         </c>
      </row>
      <row r="2" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B2" t="s">
            <v>1</v>
         </c>
      </row>
      <row r="3" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B3" t="s">
            <v>2</v>
         </c>
      </row>
      <row r="4" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B4" t="s">
            <v>3</v>
         </c>
      </row>
      <row r="5" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B5" t="s">
            <v>4</v>
         </c>
      </row>
      <row r="6" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B6" t="s">
            <v>5</v>
         </c>
      </row>
      <row r="16" spans="2:2" ht="19.5" customHeight="1" x14ac:dyDescent="0.25" />
   </sheetData>
   <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" />
</worksheet>

There, one can see it has <dimension ref="B1:B16"/>.
What does it mean?
Most likely that the sheet spans from column B1 to column B16.

What should the library do in that case?
It could "pad" the data with the missing columns A1...dimensions[0].
But would that be what the creators of the file format intended?
Perhaps that <dimension ref="B1:B16"/> element is there to instruct the program that all other columns should be ignored.

So I won't consider this as a bug.
It could be a feature-flag, something like ignoreEmptyColumns: false, but I don't want to add such a feature in the library for now.
Anyway, the ticket can stay open.

Gengisca · Answer 7 · Fri Nov 19 2021 04:32:46 GMT+0800 (China Standard Time)

Hi, I've found this doc page:
https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.sheetdimension?view=openxml-2.8.1
In effect Dimension specifies the used range of the worksheet.
But... it is optional, and for sure Dimension doesn't instruct the program that all other columns should be ignored.

In my opinion it is really important to have the worksheet in the browser exactly as the Excel one.
Adding 'ignoreEmptyColumns: false' is the solution!

In the case you haven't time or don't want to add this feature, can you tell me which modules I have to change?

Thank you
Giancarlo

PS I have already forked, but it was an error...

Nikolay · Answer 8 · Fri Nov 19 2021 04:38:38 GMT+0800 (China Standard Time)

@Gengisca Hmm, nice digging.
So, it's really something of an advice for the spreadsheet viewing software.
They don't provide any incentives on what problems would it solve.
Maybe it's an optimization thing or something.
Perhaps this way, the spreadsheet viewer could scroll to the right when opening such file so that B is the first visible column.
I'll fix then.

Nikolay · Answer 9 · Fri Nov 19 2021 06:49:51 GMT+0800 (China Standard Time)

published read-excel-file@5.2.25

Gengisca · Answer 10 · Sat Nov 20 2021 02:09:22 GMT+0800 (China Standard Time)

Hello,
Without words ... very fast !!!
Thank you very much!
A big greeting from Italy
Giancarlo