catamphetamine / read-excel-file

Read *.xlsx files in a browser or Node.js. Parse to JSON with a strict schema.

Home Page:https://catamphetamine.gitlab.io/read-excel-file/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Skips empty columns to the left

Gengisca opened this issue · comments

I've an xlsx
EMPTY_COLUMN.xlsx
with two columns, first is always empty, and the second is not empty.
The response is this one:
0: ['Rome']
1: ['Madrid']
2: ['London']
3: ['Istanbul']
4: ['Paris']
5: ['Amsterdam']

This behavior is incorrect as information from the first column is lost.
I would like to have an answer like this:
0: (2) [null, 'Rome']
1: (2) [null, 'Madrid']
2: (2) [null, 'London']
3: (2) [null, 'Istanbul']
4: (2) [null, 'Paris']
5: (2) [null, 'Amsterdam']

See if you're using the latest version.

  • readXlsxFile() now doesn't skip empty rows or columns: it only skips empty rows or columns at the end, but not in the beginning and not in the middle as it used to.

Then provide a demo somewhere on codesandbox.io and an example file

I see.

[
  [
    "Rome"
  ],
  [
    "Madrid"
  ],
  [
    "London"
  ],
  [
    "Istanbul"
  ],
  [
    "Paris"
  ],
  [
    "Amsterdam"
  ]
]

I've unzipped the Excel file and checked out the xl/worksheets/sheet1.xml file.

<?xml version="1.0" encoding="UTF-8"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" mc:Ignorable="x14ac xr xr2 xr3" xr:uid="{00000000-0001-0000-0000-000000000000}">
   <dimension ref="B1:B16" />
   <sheetViews>
      <sheetView tabSelected="1" workbookViewId="0" />
   </sheetViews>
   <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25" />
   <cols>
      <col min="1" max="1" width="54.42578125" customWidth="1" />
      <col min="2" max="2" width="42.140625" customWidth="1" />
   </cols>
   <sheetData>
      <row r="1" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B1" t="s">
            <v>0</v>
         </c>
      </row>
      <row r="2" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B2" t="s">
            <v>1</v>
         </c>
      </row>
      <row r="3" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B3" t="s">
            <v>2</v>
         </c>
      </row>
      <row r="4" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B4" t="s">
            <v>3</v>
         </c>
      </row>
      <row r="5" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B5" t="s">
            <v>4</v>
         </c>
      </row>
      <row r="6" spans="2:2" x14ac:dyDescent="0.25">
         <c r="B6" t="s">
            <v>5</v>
         </c>
      </row>
      <row r="16" spans="2:2" ht="19.5" customHeight="1" x14ac:dyDescent="0.25" />
   </sheetData>
   <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" />
</worksheet>

There, one can see it has <dimension ref="B1:B16"/>.
What does it mean?
Most likely that the sheet spans from column B1 to column B16.

What should the library do in that case?
It could "pad" the data with the missing columns A1...dimensions[0].
But would that be what the creators of the file format intended?
Perhaps that <dimension ref="B1:B16"/> element is there to instruct the program that all other columns should be ignored.

So I won't consider this as a bug.
It could be a feature-flag, something like ignoreEmptyColumns: false, but I don't want to add such a feature in the library for now.
Anyway, the ticket can stay open.

Hi, I've found this doc page:
https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.sheetdimension?view=openxml-2.8.1
In effect Dimension specifies the used range of the worksheet.
But... it is optional, and for sure Dimension doesn't instruct the program that all other columns should be ignored.

In my opinion it is really important to have the worksheet in the browser exactly as the Excel one.
Adding 'ignoreEmptyColumns: false' is the solution!

In the case you haven't time or don't want to add this feature, can you tell me which modules I have to change?

Thank you
Giancarlo

PS I have already forked, but it was an error...

@Gengisca Hmm, nice digging.
So, it's really something of an advice for the spreadsheet viewing software.
They don't provide any incentives on what problems would it solve.
Maybe it's an optimization thing or something.
Perhaps this way, the spreadsheet viewer could scroll to the right when opening such file so that B is the first visible column.
I'll fix then.

published read-excel-file@5.2.25

Hello,
Without words ... very fast !!!
Thank you very much!
A big greeting from Italy
Giancarlo