oneness / csvx

A dependency-free tool that enables you to control how to tokenize, transform and handle files with char(s) separated values.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CSVX

A dependency-free tool that enables you to control how to tokenize, transform and handle files with char(s) separated values.

Usage

Not sure if I am going to publish this to Clojars but if you are using tools.deps, you can just add following to deps.edn to add it to your project.

{:deps
  {github-oneness/csvx
    {:git/url "https://github.com/oneness/csvx"
     :sha "4d4ac868a0c6bdf9a5a6c32076303f1171c74db4"}}}
;; Then require from your repl like this:
(require '[csvx.core :as csvx])
;; You can just use the one public fn `readx` without any options to parse csv:
(csvx/readx "resources/100-sales-records.csv")
;; Note that csvx/readx takes optional arg where you can pass in following
;; options: (listed values here are defaults if no option is given See src/csvx/core.clj
;; for details).
{:encoding "UTF-8"
 :max-lines-to-read Integer/MAX_VALUE
 :line-tokenizer (fn [^String line]
		       (try (.split line ",")
			    (catch Exception e
			      (println :malformed-line line)
			      ;; (strace/print-stack-trace e) ;; handle exception here
                          nil)))
 :line-transformer #(map-indexed hash-map %)}

Say you have a giant json file that you need to parse into Clojure map, all you need to do is to to write line tokenizer and transformer like this to get it working from repl. Remember to pass in how many lines you want to read in while you are experimenting with your repl as not to blow it up:

(defn decode-json [^String file-path]
  (readx file-path ;;"sample.json"
	   {:max-lines-to-read 1
	    :line-tokenizer (fn [line]
			      (map #(.split ^String % ":")
				   (-> (clojure.string/replace line #"\{|\}" "")
				       (.split ","))))
	    :line-transformer (fn [line]
				(reduce (fn [acc [k v]]
					  (merge acc
						 {(-> k read-string keyword) (read-string v)}))
					{}
					line))}))

About

A dependency-free tool that enables you to control how to tokenize, transform and handle files with char(s) separated values.

License:Eclipse Public License 2.0


Languages

Language:Clojure 100.0%