crealytics / spark-excel

A Spark plugin for reading and writing Excel files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect Data Frame creation

akshitarora4259 opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am working with an Excel file, which contains data in the following format:

The first column of the first row contains a Note specifying the details of the data, the file contains.
The second row is empty.
The third row onwards, contains the actual data which has 5 columns of data.
When I am creating a data frame using this Excel file using the following code snippet:
df = spark.read.format("com.crealytics.spark.
incorrect dataframe excel.xlsx
excel").option("header", False).option("dataAddress", "'Sheet3'!").load("file.xlsx")

Though, my data contains 5 columns, in the final data frame I can see only 1 column

Expected Behavior

I want the generated data frame should consider all the columns of data in the file, instead of the column from the first row.
The final data frame should contain the columns of data present from 3rd row onwards too.
the

Steps To Reproduce

df = spark.read.format("com.crealytics.spark.
incorrect dataframe excel.xlsx
excel").option("header", False).option("dataAddress", "'Sheet3'!").load("file.xlsx")
incorrect dataframe excel.xlsx

Environment

- Spark version:
- Spark-Excel version:
- OS:
- Cluster environment

Anything else?

No response

Hi @akshitarora4259,

When you specify the dataAddress please point to the location where the data is within the sheet. For example, if you say the data is on the third row then your dataAddress would be something like dataAddress="'Sheet3'!A3:E9999999". Let me know if this works for you.