[BUG] Excel File with Macros Detected as "Potentially" Malicious. Unable to read Excel as a result.
nova-jj opened this issue · comments
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
Within an Azure Databricks Environment we're using this library to read Excel files stored in a Storage Account accessed using either the ABFSS or DBFS protocols, suggesting this is a file issue and not a protocol issue.
.
Attempting to read the file with newer versions of the spark-excel library result in the following error caused by macros in the workbook: crealytics excel workbook java.io.IOException: The file appears to be potentially malicious. "This file embeds more internal file entries than expected."
We have reverted to a previous version that does not present this error and are looking for a solution that allows us to bypass the macro detection in our workbook which does contain macros, but are required as part of the workbook.
Expected Behavior
Reading the file into a dataframe should not be met with this error, OR, an option to override the macro detection in order to be able to force-read when "potentially" maliciousness is present.
Steps To Reproduce
The following python code produces our error:
file_path= "dbfs:/FileStore/our_excel_file.xlsm"
df = spark.read.format("com.crealytics.spark.excel").option("header", "true").load(file_path)
df = df.toPandas()
Environment
- Spark version: 3.4.1 via Databricks Runtime 13.3
- Spark-Excel version: 3.5.0_0.20.3
- OS: Windows but remote-run from Databricks clusters
- Cluster environment: Multiple cluster configurations representing dev/stg/prd using the same Databricks Runtime and Spark Versions.
Anything else?
We have reverted to using the previous version maven coordinates: com.crealytics:spark-excel_2.12:0.13.7
for our install which does not produce this issue.
spark-excel doesn't do anything in that regard.
It must be an upstream library that performs this check. Can you try to find out if this comes from POI?