hbutani / spark-druid-olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Home Page:http://sparklinedata.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Setting up dataset as part of quick start on spark-druid throws json parser error

manasadanda opened this issue · comments

I have implemented the steps listed as part of quick start guide to test spark sql with druid. while executing a command to set index with raw data, I get a Json Object error. Could you please help identify what's causing the issue. I have imported jackson to parse json in spark. Below is the error message.

scala> sql("""
| CREATE TEMPORARY TABLE orderLineItemPartSupplier
| USING org.sparklinedata.druid
| OPTIONS (sourceDataframe "orderLineItemPartSupplierBase",
| timeDimensionColumn "l_shipdate",
| druidDatasource "tpch",
| druidHost "localhost",
| druidPort "8082",
| columnMapping '{ "l_quantity" : "sum_l_quantity", "ps_availqty" : "sum_ps_availqty", "cn_name" : "c_nation", "cr_name" : "c_region", "sn_name" : "s_nation", "sr_name" : "s_region" } ',
| functionalDependencies '[ {"col1" : "c_name", "col2" : "c_address", "type" : "1-1"}, {"col1" : "c_phone", "col2" : "c_address", "type" : "1-1"}, {"col1" : "c_name", "col2" : "c_mktsegment", "type" : "n-1"}, {"col1" : "c_name", "col2" : "c_comment", "type" : "1-1"}, {"col1" : "c_name", "col2" : "c_nation", "type" : "n-1"}, {"col1" : "c_nation", "col2" : "c_region", "type" : "n-1"} ] ',
| starSchema ' { "factTable" : "orderLineItemPartSupplier", "relations" : [] } ')
| """.stripMargin
| )
org.json4s.package$MappingException: Do not know how to convert JObject(List()) into class java.lang.String
at org.json4s.Extraction$.convert(Extraction.scala:559)
at org.json4s.Extraction$.extract(Extraction.scala:331)
at org.json4s.Extraction$.extract(Extraction.scala:42)
at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
at org.sparklinedata.druid.client.DruidClient.timeBoundary(DruidClient.scala:122)
at org.sparklinedata.druid.client.DruidClient.metadata(DruidClient.scala:130)
at org.sparklinedata.druid.metadata.DruidRelationInfo$.apply(DruidRelationInfo.scala:62)
at org.sparklinedata.druid.DefaultSource.createRelation(DefaultSource.scala:89)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.execution.datasources.CreateTempTableUsing.run(ddl.scala:93)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:57)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:59)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:61)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:63)
at $iwC$$iwC$$iwC$$iwC.(:65)
at $iwC$$iwC$$iwC.(:67)
at $iwC$$iwC.(:69)
at $iwC.(:71)
at (:73)
at .(:77)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Can you check if your druid daemons are up. This error typically happens when an http request returns an http 500 error.

I've verified that druid services are running. There are no http proxies and I am trying this on my local. I was able to resolve above error by importing jackson libraries to parse json. But now, am getting a different. I've tried testing sparkling using spark-shell and also stand-alone app as per the instructions in:
https://github.com/SparklineData/spark-druid-olap/wiki/Quick-Start-Guide

I am sort of stuck at this point, any help would be greatly appreciated. Thank you.


sparkline_test.zip

spark-submit --packages com.databricks:spark-csv_2.10:1.1.0,SparklineData:spark-datetime:0.0.2 --jars /Users/dandama/Downloads/spark-druid-olap-assembly-0.0.3.jar --class "SparklineTest" --master local[2] target/scala-2.10/sample-project_2.10-1.0.jar
Ivy Default Cache set to: /Users/dandama/.ivy2/cache
The jars for the packages stored in: /Users/dandama/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/Cellar/apache-spark/1.6.1/libexec/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
SparklineData#spark-datetime added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.10;1.1.0 in list
found org.apache.commons#commons-csv;1.1 in list
found com.univocity#univocity-parsers;1.5.1 in list
found SparklineData#spark-datetime;0.0.2 in spark-packages
found com.github.nscala-time#nscala-time_2.10;1.6.0 in list
found joda-time#joda-time;2.5 in list
found org.joda#joda-convert;1.2 in central
:: resolution report :: resolve 457ms :: artifacts dl 18ms
:: modules in use:
SparklineData#spark-datetime;0.0.2 from spark-packages in [default]
com.databricks#spark-csv_2.10;1.1.0 from list in [default]
com.github.nscala-time#nscala-time_2.10;1.6.0 from list in [default]
com.univocity#univocity-parsers;1.5.1 from list in [default]
joda-time#joda-time;2.5 from list in [default]
org.apache.commons#commons-csv;1.1 from list in [default]
org.joda#joda-convert;1.2 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 7 | 0 | 0 | 0 || 7 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 7 already retrieved (0kB/16ms)
JObject(List((l_quantity,JString(sum_l_quantity)), (ps_availqty,JString(sum_ps_availqty)), (cn_name,JString(c_nation)), (cr_name,JString(c_region)), (sn_name,JString(s_nation)), (sr_name,JString(s_region))))
Exception in thread "main" com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'JObject': was expecting ('true', 'false' or 'null')
at [Source: JObject(List((l_quantity,JString(sum_l_quantity)), (ps_availqty,JString(sum_ps_availqty)), (cn_name,JString(c_nation)), (cr_name,JString(c_region)), (sn_name,JString(s_nation)), (sr_name,JString(s_region)))); line: 1, column: 8]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
at org.sparklinedata.druid.DefaultSource$$anonfun$6.apply(DefaultSource.scala:66)
at org.sparklinedata.druid.DefaultSource$$anonfun$6.apply(DefaultSource.scala:65)
at scala.Option.map(Option.scala:145)
at org.sparklinedata.druid.DefaultSource.createRelation(DefaultSource.scala:65)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.execution.datasources.CreateTempTableUsing.run(ddl.scala:92)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at SparklineTest$.main(SparklineTest.scala:127)
at SparklineTest.main(SparklineTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Looks like you are using the 0.0.3 version: /Users/dandama/Downloads/spark-druid-olap-assembly-0.0.3.jar

Can you try with the latest 0.1.0; you will need Spark 1.6 and Druid 0.8 or Druid 0.9