[BUG] <title>Spark Excel reads all Excel files under the file
xuhaosanqiu opened this issue · comments
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
I used the following method to read all the files in the folder, but the efficiency was slow
val files = new File(directoryPath).listFiles.filter(_.getName.endsWith(".xls"))
var df = spark.emptyDataFrame
for ((file, index) <- files.zipWithIndex) {
val temdf = spark.read.excel(
header = true,
dataAddress = "0!A1"
).load(file.toString)
if (index == 0) {
df = temdf
}else{
df = df.union(temdf)
}
}
Expected Behavior
Is it possible to directly read all the files under the folder? Union is too time-consuming
Steps To Reproduce
No response
Environment
- Spark version:
- Spark-Excel version:
- OS:
- Cluster environment
Anything else?
No response
Hi @xuhaosanqiu, you can try the v2 version:
https://github.com/crealytics/spark-excel#excel-api-based-on-datasourcev2
Hello, I read the windos local folder and told me that there is insufficient permission for the java. io. FileNotFoundException. But it can be read from a single file. Is there a solution?
<dependency>
<groupId>com.crealytics</groupId>
<artifactId>spark-excel_2.12</artifactId>
<version>3.0.1_0.18.7</version>
</dependency>
Not sure if I can help with this, but you'd need to at least provide the full information from this page:
https://github.com/crealytics/spark-excel/blob/main/.github/ISSUE_TEMPLATE/generic.yml