damballa / parkour

Hadoop MapReduce in idiomatic Clojure.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

avro/dsink determine output path from input?

stanfea opened this issue · comments

Hi,

Any way to have something like ::mr/sink-as dux/prefix-keys but instead of adding prefix creates a subdir?
Or a way to have an avro dseq as input that filters by prefix in filename?

Thanks!

Stefan

The dux/prefix-keys prefixes can have "/" in them (any number even) to create files in subdirectories of the output directory. Note that this currently breaks the return-value dseq the parkour.graph API will create on the job results; a PR fixing the issue would be welcome. I personally don't use this feature much, and the question of exactly what to return is a bit tricky -- one dseq over all the sunk content vs inferring a structured division into multiple outputs.

"Filters by prefix in filename" -- you can always use globs over the prefix as input: (mra/dseq [:default] "previous-output-dir/some-prefix-*").

wow thanks this is really genius work you've done here!