crealytics / spark-excel

A Spark plugin for reading and writing Excel files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE] Optimize JAR size

alessandrorimoldi opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Hello, I was checking your library and I realised that the final jar size is quite big (30MB).
The library is very useful but it's very heavy to include it in fat jar, maybe it's possible to optimize a bit.
Checking on maven I noticed that:

  • you are including scala as compile dependency but I think it could be marked as provided
  • you are including both poi-ooxml and poi-ooxml-lite (I think that only one of these should be included checking the FAQ#3

I don't know if these tips can help

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

Hi @alessandrorimoldi, do you have any restrictions that make the 30MB too big?
There might be a few tweaks to reduce the file size, but I've already managed to mess up the Uber-JAR packaging too many times to be motivated to try again 😉

Hi @nightscape, not a real restriction but I wanted to use your library in one of my libraries that is shared among all my projects and, since I'm creating fat jars for spark and then uploading them on the cluster, the 30MB extra is a bit annoying for my use case.
I don't know if you have already tried in the past the two things I have listed above but if they work you should be able to reduce the jar size by 10MB and it's a not a bad starting point.