A bug in WebLogDataExample
yumg opened this issue · comments
When I run the example org.deeplearning4j.datapipelineexamples.transform.basic.WebLogDataExample
,
I got an exception as below:
11:17:23.867 [Executor task launch worker for task 8] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 2.0 (TID 8)
java.lang.IllegalArgumentException: Invalid format: "01/Jul/1995:00:00:01 -0400" is malformed at "Jul/1995:00:00:01 -0400"
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:752)
at org.datavec.api.transform.transform.time.StringToTimeTransform.map(StringToTimeTransform.java:243)
at org.datavec.api.transform.transform.BaseColumnTransform.map(BaseColumnTransform.java:92)
at org.datavec.spark.transform.transform.SparkTransformFunction.call(SparkTransformFunction.java:48)
at org.datavec.spark.transform.transform.SparkTransformFunction.call(SparkTransformFunction.java:32)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
That is because my default locale is Locale.CHINA
, the program can not recognize the month word Jul
.
The locale needs to be specified explicitly.
I found there is an API that can explicitly set the locale for the DateStringformatter org.datavec.api.transform.TransformProcess.Builder.stringToTimeTransform(String column, String format, DateTimeZone dateTimeZone, Locale locale)
So, we can fix the bug , by modify the WebLogDataExample's line 140 changing the original call (stringToTimeTransform(String column, String format, DateTimeZone dateTimeZone)
) to that API.
It seems like the right solution. But unfortunately, I found the API with explicitly defining the locale is not working....
I will work on it.
The problem was eventually found to be in the code org.datavec.api.transform.transform.time.StringToTimeTransform
The method readObject
has a bug:
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
in.defaultReadObject();
if(timeFormat != null)
formatter = DateTimeFormat.forPattern(timeFormat).withZone(timeZone);
else {
List<DateTimeFormatter> dateFormatList = new ArrayList<>();
formatters = new DateTimeFormatter[formats.length];
for(int i = 0; i < formatters.length; i++) {
dateFormatList.add(DateTimeFormat.forPattern(formats[i]).withZone(timeZone));
}
formatters = dateFormatList.toArray(new DateTimeFormatter[dateFormatList.size()]);
}
}
It should be like this:
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
in.defaultReadObject();
if(timeFormat != null)
if (locale != null) {
this.formatter = DateTimeFormat.forPattern(timeFormat).withZone(timeZone).withLocale(locale);
} else {
this.formatter = DateTimeFormat.forPattern(timeFormat).withZone(timeZone);
}
else {
List<DateTimeFormatter> dateFormatList = new ArrayList<>();
formatters = new DateTimeFormatter[formats.length];
for(int i = 0; i < formatters.length; i++) {
dateFormatList.add(DateTimeFormat.forPattern(formats[i]).withZone(timeZone));
}
formatters = dateFormatList.toArray(new DateTimeFormatter[dateFormatList.size()]);
}
}