damballa / parkour

Hadoop MapReduce in idiomatic Clojure.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seeing data-reader mapping error on clojure 1.6

opened this issue · comments

Hey,

I'm trying to get parkour running on our internal data system, and I'm getting an error that I can't really diagnose (as it claims to have been fixed in clojure 1.5 as clj-1034). Any ideas?

Here is the error:

clojure.lang.ExceptionInfo: Conflicting data-reader mapping {:url #<URL jar:file:/prod-analytics-0.1.0-SNAPSHOT-standalone.jar!/data_readers.clj>, :conflict hadoop.conf/configuration, :mappings {parkour/dval #'parkour.io.dval/dval-reader, parkour/dcpath #'parkour.io.dval/dcpath-reader, java.net/uri #'parkour.fs/uri, hadoop.mapreduce/job #'parkour.mapreduce/job, hadoop.fs/path #'parkour.fs/path, hadoop.conf/configuration #'parkour.conf/configuration}}
        at clojure.core$ex_info.invoke(core.clj:4227)
        at clojure.core$load_data_reader_file$fn__6356.invoke(core.clj:6671)
        at clojure.core.protocols$fn__5871.invoke(protocols.clj:76)
        at clojure.core.protocols$fn__5828$G__5823__5841.invoke(protocols.clj:13)
        at clojure.core$reduce.invoke(core.clj:6030)
        at clojure.core$load_data_reader_file.invoke(core.clj:6664)
        at clojure.core.protocols$fn__5883.invoke(protocols.clj:128)
        at clojure.core.protocols$fn__5854$G__5849__5863.invoke(protocols.clj:19)
        at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
        at clojure.core.protocols$fn__5877.invoke(protocols.clj:48)
        at clojure.core.protocols$fn__5828$G__5823__5841.invoke(protocols.clj:13)
        at clojure.core$reduce.invoke(core.clj:6030)
        at clojure.core$load_data_readers$fn__6360.invoke(core.clj:6683)
        at clojure.lang.AFn.applyToHelper(AFn.java:161)
        at clojure.lang.AFn.applyTo(AFn.java:151)
        at clojure.lang.Var.alterRoot(Var.java:336)
        at clojure.core$alter_var_root.doInvoke(core.clj:4839)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at clojure.core$load_data_readers.invoke(core.clj:6680)
        at clojure.core$fn__6363.invoke(core.clj:6686)
        at clojure.core__init.load(Unknown Source)
        at clojure.core__init.<clinit>(Unknown Source)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at clojure.lang.RT.loadClassForName(RT.java:2056)
        at clojure.lang.RT.load(RT.java:419)
        at clojure.lang.RT.load(RT.java:400)
        at clojure.lang.RT.doInit(RT.java:436)
        at clojure.lang.RT.<clinit>(RT.java:318)
        at clojure.lang.Namespace.<init>(Namespace.java:34)
        at clojure.lang.Namespace.findOrCreate(Namespace.java:176)
        at clojure.lang.Var.internPrivate(Var.java:163)
        at prod_analytics.core.<clinit>(Unknown Source)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:201)
Exception in thread "main" java.lang.ExceptionInInitializerError
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at clojure.lang.RT.loadClassForName(RT.java:2056)
        at clojure.lang.RT.load(RT.java:419)
        at clojure.lang.RT.load(RT.java:400)
        at clojure.lang.RT.doInit(RT.java:436)
        at clojure.lang.RT.<clinit>(RT.java:318)
        at clojure.lang.Namespace.<init>(Namespace.java:34)
        at clojure.lang.Namespace.findOrCreate(Namespace.java:176)
        at clojure.lang.Var.internPrivate(Var.java:163)
        at prod_analytics.core.<clinit>(Unknown Source)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:201)

and here is my project.clj:

(defproject prod-analytics "0.1.0-SNAPSHOT"
  :url ""
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [clojure-csv/clojure-csv "2.0.1"]
                 [org.clojure/algo.generic "0.1.2"]
                 [com.damballa/parkour "0.6.1"]
                 [org.apache.avro/avro "1.7.5"]
                 [org.apache.avro/avro-mapred "1.7.5"
                  :classifier "hadoop2"]
                 [org.codehaus.jsr166-mirror/jsr166y "1.7.0"]
                 ]
  :global-vars {*warn-on-reflection* true}
  :exclusions [org.apache.hadoop/hadoop-core
               org.apache.hadoop/hadoop-common
               org.apache.hadoop/hadoop-hdfs
               org.slf4j/slf4j-api org.slf4j/slf4j-log4j12 log4j
               org.apache.avro/avro
               org.apache.avro/avro-mapred
               org.apache.avro/avro-ipc]

  :repositories [["conjars" "http://conjars.org/repo"]
                ["cloudera" "https://repository.cloudera.com/content/repositories/releases"]]

  :main prod-analytics.core
  :profiles {:provided
             {:dependencies
              [[org.apache.hadoop/hadoop-client "2.0.0-mr1-cdh4.2.0"]
               [org.apache.hadoop/hadoop-core "2.0.0-mr1-cdh4.2.0"] 
               [org.apache.hadoop/hadoop-common "2.0.0-cdh4.2.0"]
               [org.slf4j/slf4j-api "1.6.1"]
               [org.slf4j/slf4j-log4j12 "1.6.1"]
               [log4j/log4j "1.2.17"]]}
             :aot {:aot :all, :compile-path "target/aot/classes"}
             :uberjar [:aot]
             :jobjar [:aot]})

Hmm. This is not a problem I've seen recently myself... What version of Leiningen are you using, and in what context are you getting this exception?

leiningen 2.4.1, and I get this error when I 'hadoop jar' the standalone uberjar.

Potential failure of imagination, but I'm just not seeing how it's possible to that exception with Clojure 1.6, at least without having an actual different data-reader var for hadoop.conf/configuration. Maybe verify that the JAR's clojure/core.clj file is in fact for 1.6?

I hear you, I'm having a similar struggle. Starting "lein repl" from the same directory echoes the version of clojure, in this case, this is what I see:

nREPL server started on port 53875 on host 127.0.0.1 - nrepl://127.0.0.1:53875
REPL-y 0.3.5, nREPL 0.2.6
Clojure 1.6.0

I tried to reproduce this a few ways, and the only way I could do it was by sneaking a version of Clojure 1.5.x onto the classpath. Could you try the following in your environment with your JAR?:

lein do clean, uberjar
zip -d target/prod-analytics-0.1.0-SNAPSHOT-standalone.jar data_readers.clj META-INF/MANIFEST.MF
hadoop jar target/prod-analytics-0.1.0-SNAPSHOT-standalone.jar clojure.main -e '(prn *clojure-version*)'

That is pretty amazing, i have to admit. In doing this I find that somehow I'm not getting what looks like 1.5, but instead 1.4. I have no idea how this could even be there.

$ hadoop jar prod-analytics-0.1.0-SNAPSHOT-standalone.jar  clojure.main -e '(prn *clojure-version*)'
{:major 1, :minor 4, :incremental 0, :qualifier nil}

I'd check your HADOOP_CLASSPATH environment variable, the contents of your configuration hadoop-env.sh, and the output of running hadoop classpath. At least one of those should reveal the offending JAR, and thus hopefully where it came from.

Due to similar issues (albeit not yet with Clojure itself), I generally avoid the hadoop jar command and use hadoop classpath to build a java command-line placing my own (uber) JAR first. E.g., java -cp "example-standalone.jar:$(hadoop classpath)" clojure.main -m example.core. You may also need/want to also set your distribution-specific java.library.path or other native library path properties.

Thanks for the advice. I'm really excited to use parkour and you helped out a great deal here!

@cpb83 Did you solved this issue? Thanks