Support DataFormatter in spring-batch-excel POI implementation
dusiema opened this issue · comments
DataFormatter enables the POI version in spring-batch-excel to read the cell values as they appear in Excel (rather than returning the value with the type that excel used internally.
I would like to add this as an option to the PoiItemReader
- so the user can choose to retrieve all values as Strings and just in the way they appear in Excel.
The reason is that I am having numbers that I want to be read as strings. But this is currently not possible.
That might be even better then to String
conversion that is in place at the moment. Although we might want to give a choice of returning dates as formatted or as timestamps in long.
I'm reconsidering converting everything to String
values. POI (and JXL) is perfectly capable of returning for instance a Date
or long
or whatever type. Currently we convert everything to String
and if needed convert it to Date
again. Depending on the type of field you are binding to.
I'm considering extending the RowSet
with getters for different types, much like the java.sql.ResultSet
. That would save conversion Date
-> String
-> Date
or whatever. You can then just ask for what you want Date
or long
.
I am happy to provide a pull request if you provide some technical guidance as to how you want the solution to look like.
We've got plenty of problems with this PoiSheet due to the fact that we want numbers as Strings but as they appear in the spreadsheet so the double coming back from numeric cell types is not helpfull at all.
And this is not for dates only. Its for phone numbers too. I am getting this phone number back as:
3545876647 - > 3.545876647E9
Currently the RowSet
only has the getColumnValue
method which returns a java.lang.String
. I'm looking into an API more akin to the java.sql.ResultSet
with getDate(int column)
and getLong(int column)
. That way the user is more flexible and it potentially saves us from doing all the additional conversions.
This would need changes in the RowSet
API as well as the Sheet
abstraction as the conversion is now always done in the Sheet
abstraction, instead we should return the raw underlying Row
or Cell[]
.
This change might also uncover some weird relations between RowSet
and Sheet
. Maybe we should consider the separate Sheet
instances to also implement the RowSet
interface instead of having another intermediate object DefaultRowSet
.
However we might defer this until we have removed the JXL support (see #19) as that would simplify the API and make it POI only.
This sounds like a great idea. Using the BeanWrapperRowMapper I have run into annoying instances of Dates being turned into miliseconds, numbers returned in scientific notation, and number with a '.0' added the the end. I am having problems figuring out how to apply this paragraph in the javadoc:
To customize the way that RowSet values are converted to the desired type for injecting into the prototype there are several choices. You can inject PropertyEditor instances directly through the customEditors property, or you can override the createBinder(Object) and initBinder(DataBinder) methods, or you can provide a custom RowSet implementation.
I added the DataFormatter in a branch of my fork.
Check it out here:
https://github.com/dusiema/spring-batch-extensions/tree/UseDataFormatter%239
This is what I am currently using in my project.
I just set a boolean
in PoiItemReader
to be able to decide if I want to use the DataFormatter
or not.
Then in PoiSheet
this is used.
I could of course generate a pull request for this to pull it into the upstream but I don't know if this is "elegant" enough.
Thank you dusiema,
I agree that your solution isn't elegant but only because there is no way to configure the formatter. Saying that, it worked great for me!
Glad it was useful. Using a formatter could be done by defining the a DataFormatter instead of a boolean and use it whenever there is one defined. Then one could just add a format with spring configuration.
Internally we now use a DataFormatter
or FormulaEvaluator
depending on the type of the column. So this should be fixed with #65.