spring-projects / spring-batch-extensions

Spring Batch Extensions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support DataFormatter in spring-batch-excel POI implementation

dusiema opened this issue · comments

commented

DataFormatter enables the POI version in spring-batch-excel to read the cell values as they appear in Excel (rather than returning the value with the type that excel used internally.

I would like to add this as an option to the PoiItemReader - so the user can choose to retrieve all values as Strings and just in the way they appear in Excel.

The reason is that I am having numbers that I want to be read as strings. But this is currently not possible.

That might be even better then to String conversion that is in place at the moment. Although we might want to give a choice of returning dates as formatted or as timestamps in long.

I'm reconsidering converting everything to String values. POI (and JXL) is perfectly capable of returning for instance a Date or long or whatever type. Currently we convert everything to String and if needed convert it to Date again. Depending on the type of field you are binding to.

I'm considering extending the RowSet with getters for different types, much like the java.sql.ResultSet. That would save conversion Date -> String -> Date or whatever. You can then just ask for what you want Date or long.

I am happy to provide a pull request if you provide some technical guidance as to how you want the solution to look like.

We've got plenty of problems with this PoiSheet due to the fact that we want numbers as Strings but as they appear in the spreadsheet so the double coming back from numeric cell types is not helpfull at all.

And this is not for dates only. Its for phone numbers too. I am getting this phone number back as:

3545876647 - > 3.545876647E9

Currently the RowSet only has the getColumnValue method which returns a java.lang.String. I'm looking into an API more akin to the java.sql.ResultSet with getDate(int column) and getLong(int column). That way the user is more flexible and it potentially saves us from doing all the additional conversions.

This would need changes in the RowSet API as well as the Sheet abstraction as the conversion is now always done in the Sheet abstraction, instead we should return the raw underlying Row or Cell[].

This change might also uncover some weird relations between RowSet and Sheet. Maybe we should consider the separate Sheet instances to also implement the RowSet interface instead of having another intermediate object DefaultRowSet.

However we might defer this until we have removed the JXL support (see #19) as that would simplify the API and make it POI only.

This sounds like a great idea. Using the BeanWrapperRowMapper I have run into annoying instances of Dates being turned into miliseconds, numbers returned in scientific notation, and number with a '.0' added the the end. I am having problems figuring out how to apply this paragraph in the javadoc:

To customize the way that RowSet values are converted to the desired type for injecting into the prototype there are several choices. You can inject PropertyEditor instances directly through the customEditors property, or you can override the createBinder(Object) and initBinder(DataBinder) methods, or you can provide a custom RowSet implementation.

commented

I added the DataFormatter in a branch of my fork.

Check it out here:
https://github.com/dusiema/spring-batch-extensions/tree/UseDataFormatter%239

This is what I am currently using in my project.

I just set a boolean in PoiItemReader to be able to decide if I want to use the DataFormatter or not.

Then in PoiSheet this is used.

I could of course generate a pull request for this to pull it into the upstream but I don't know if this is "elegant" enough.

Thank you dusiema,
I agree that your solution isn't elegant but only because there is no way to configure the formatter. Saying that, it worked great for me!

commented

Glad it was useful. Using a formatter could be done by defining the a DataFormatter instead of a boolean and use it whenever there is one defined. Then one could just add a format with spring configuration.

Internally we now use a DataFormatter or FormulaEvaluator depending on the type of the column. So this should be fixed with #65.