crealytics / spark-excel

A Spark plugin for reading and writing Excel files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New Case on Large Number Being Captured As Scientific Notation

DamonYip9891 opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The Excel file (".xlsx" format) contains 3 columns. The issue, the sample image and the attachment are as follows:
The first column is being read as string format correctly.
The second column displayed in General format, as mentioned in issue #126 , and it can be read correctly by adjusting the parameter "usePlainNumberFormat=true".
In the third column, there is a value displayed in scientific notation, but in the formula bar, it shows as "230714073456". The display format for other values in this column is General. Regardless of parameter adjustments, it is not possible to read this value correctly.

image

My PySpark codes:
df = spark.read.format("com.crealytics.spark.excel")\ .option("header", "true")\ .option("dataAddress","""'sheet1'!A1""")\ .option("usePlainNumberFormat", "true")\ .load(file_path)

sample.xlsx

Expected Behavior

The third column of the table should be read as a string type into the dataframe. The value "230714073456", which displayed in scientific notation, should be read in its entirety.

Steps To Reproduce

No response

Environment

- Spark version:3.2.1
- Spark-Excel version:com.crealytics:spark-excel_2.12:3.4.1_0.19.0
- OS:Databricks on AWS
- Cluster environment:64GB, 8 Cores. DBR 10.4 LTS aarch64, Spark 3.2.1, Scala 2.12

Anything else?

No response