crealytics / spark-excel

A Spark plugin for reading and writing Excel files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] date columns are not read properly, if "inferSchema" is set to false

TarekSalha opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am based in germany, which uses date format dd.mm.yyyy (e.g. 21.04.2022 for 21th of april). I want to read in an excel file using V2 package inside an azure synapse spark cluster. The excel contains a column, that is of type date.

df = spark.read.format("excel")\ 
    .option("header", "true")\
    .option("inferSchema", "false")\
    .load("myWorkbook.xlsx")

When inspecting the resulting dataFrame, the result of the above sample date would be "04/21/2022".

Expected Behavior

if no schema is inferred and string is used as datatype, I would expect the connector to offer a localization option, such that it returns "21.04.2022" in my string column instead of "04/21/2022"

Steps To Reproduce

No response

Environment

- Spark version: 3.1
- Spark-Excel version: com.crealytics:spark-excel_2.12:3.1.3_0.18.5
- OS: ?? (Synapse Spark Cluster)

Anything else?

No response

Not sure if I understand correctly. Where are you seeing the 04/21/2022?
Can you do a df.printSchema()?