edwardcapriolo / filecrush

Remedy small files by combining them into larger ones.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Avro support

thanigaiv opened this issue · comments

Does it support merging avro files? When I provide the following command, I get an error saying that FileInputFormat is invalid..

Command:
hadoop jar filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush -Ddfs.block.size=134217728
--input-format="org.apache.avro.mapred.AvroInputFormat"
--output-format="org.apache.avro.mapred.AvroInputFormat"
/data/dir /data/dir-merge 20100222177812

Error:
Not a FileInputFormat:org.apache.avro.mapred.AvroInputFormat
at com.m6d.filecrush.crush.Crush.createJobConfAndParseArgs(Crush.java:531)

You need to have avro on your classpath.

e.g.

HADOOP_CLASSPATH='/opt/cloudera/parcels/CDH/jars/avro-tools-1.7.6-cdh5.5.2.jar' hadoop jar filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush -libjars /opt/cloudera/parcels/CDH/jars/avro-tools-1.7.6-cdh5.5.2.jar --input-format="org.apache.avro.mapred.AvroInputFormat" --output-format="org.apache.avro.mapred.AvroOutputFormat" compact-test/2016/06/22 compact-test-out/06/22 $(date +%Y%m%d%H%M%S)

I'm now running into what appears to be a schema problem, but that's another story....

2016-06-22 15:44:09,639 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.NullPointerException at java.io.StringReader.<init>(StringReader.java:50) at org.apache.avro.Schema$Parser.parse(Schema.java:1012) at org.apache.avro.Schema.parse(Schema.java:1064) at org.apache.avro.mapred.AvroJob.getOutputSchema(AvroJob.java:143) at org.apache.avro.mapred.AvroOutputFormat.getRecordWriter(AvroOutputFormat.java:153) at com.m6d.filecrush.crush.CrushReducer.createRecordWriter(CrushReducer.java:389) at com.m6d.filecrush.crush.CrushReducer.reduce(CrushReducer.java:299) at com.m6d.filecrush.crush.CrushReducer.reduce(CrushReducer.java:47) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)

I tried a fork or two but they didn't seem to output anything for me).

Was this solved?