Package is deprecated. Use jl-sql
instead
Utility kit for working with JSON files/streams divided by \n
symbol.
JSON-representation of each record shouldn't contain line breaking symbol.
npm install -g jl
We have a log file /tmp/test.json
, that contains next records
{"ts": 1416595508, "type": "click"}
{"ts": 1416478467, "type": "buy", "price": 10}
{"ts": 1416466930, "type": "click"}
{"ts": 1416622653, "type": "buy", "price": 20}
{"ts": 1416699396, "type": "click"}
{"ts": 1416624334, "type": "click"}
{"ts": 1416518859, "type": "click"}
{"ts": 1416569870, "type": "click"}
{"ts": 1416573325, "type": "click"}
{"ts": 1416682270, "type": "click"}
jl-sql
utility supports query language very similar to SQL
cat /tmp/test.json \
| jl-sql 'SELECT type, COUNT(*) AS c, SUM(price) AS sum GROUP BY type ORDER BY c NUMERIC DESC'
{"type":"click","c":8,"sum":0}
{"type":"buy","c":2,"sum":30}
As you can see we receive that we expect.
Please note non-standard keyword NUMERIC
after ORDER BY
:
without this keyword, sorting will be by string representation, not by numerical value.
NULL
TRUE
FALSE
+
-
*
/
%
=
,==
!=
===
- Comparison with type check!==
- Negative comparison with type check>
<
>=
<=
AND
,&&
OR
,||
!
FROM_UNIXTIME(unixTimestamp)
- converts unix timestamp to DateUNIX_TIMESTAMP(date)
- converts Date to unix timestamp, date should be string in'YYYY-MM-DD hh:mm:ss'
formatDATE(date)
- returns date in'YYYY-MM-DD'
format
IF(expression, ifTrue, ifFalse)
COALESCE(expression1[, expression2[, ...]])
Right now, only very basic aggregating functions supported:
SUM(expression)
- counts summary of field's values. Non-numerical values will be ignoredMIN(expression)
- returns minimal field's value. Non-numerical values will be ignoredMAX(expression)
- returns maximal field's value. Non-numerical values will be ignoredCOUNT(expression)
- counts amount of elements for which comparison withexpression
will be !== null or !== undefinedCOUNT(*)
- counts amount of elementsHLL_COUNT_DISTINCT(arg1[, args2[, ...]])
- counts amount of unique argument combinations. UsesO(1)
of memory andO(N)
of CPU, but returns result with 0.1% error. For more information read HyperLogLog algorithm.
JOIN
is not supported(and probably never will be)- Sorting by multiple fields is not supported
LIMIT
is not supported, but could be replaced withhead
andtail
SQL-interface only wrapper for utility kit described down below. By combining those utilities you can do much more than with SQL.
Let's filter only events with type buy
cat /tmp/test.json \
| jl-filter 'type == "buy"'
{"ts": 1416478467, "type": "buy", "price": 10}
{"ts": 1416622653, "type": "buy", "price": 20}
Add sum calculation for all price
fields
cat /tmp/test.json \
| jl-filter 'type == "buy"'
| jl-sum price
{"value":30}
To get just 30
you can add jl-extract value
at the end - this command extract value of the field.
Let sort log file by ts
field
cat /tmp/test.json \
| jl-sort ts
{"ts": 1416466930, "type": "click"}
{"ts": 1416478467, "type": "buy", "price": 10}
{"ts": 1416518859, "type": "click"}
{"ts": 1416569870, "type": "click"}
{"ts": 1416573325, "type": "click"}
{"ts": 1416595508, "type": "click"}
{"ts": 1416622653, "type": "buy", "price": 20}
{"ts": 1416624334, "type": "click"}
{"ts": 1416682270, "type": "click"}
{"ts": 1416699396, "type": "click"}
The standard arguments of sort
utility is also supported: -r
, -n
, -u
, -m
, -s
, -T
, -S
Let's add to each object field date
, that contains event date in UTC format
cat /tmp/test.json \
| jl-transform '{r.date = (new Date(r.ts * 1000)).toUTCString()}'
{"ts":1416595508,"type":"click","date":"Fri, 21 Nov 2014 18:45:08 GMT"}
{"ts":1416478467,"type":"buy","price":10,"date":"Thu, 20 Nov 2014 10:14:27 GMT"}
{"ts":1416466930,"type":"click","date":"Thu, 20 Nov 2014 07:02:10 GMT"}
{"ts":1416622653,"type":"buy","price":20,"date":"Sat, 22 Nov 2014 02:17:33 GMT"}
{"ts":1416699396,"type":"click","date":"Sat, 22 Nov 2014 23:36:36 GMT"}
{"ts":1416624334,"type":"click","date":"Sat, 22 Nov 2014 02:45:34 GMT"}
{"ts":1416518859,"type":"click","date":"Thu, 20 Nov 2014 21:27:39 GMT"}
{"ts":1416569870,"type":"click","date":"Fri, 21 Nov 2014 11:37:50 GMT"}
{"ts":1416573325,"type":"click","date":"Fri, 21 Nov 2014 12:35:25 GMT"}
{"ts":1416682270,"type":"click","date":"Sat, 22 Nov 2014 18:51:10 GMT"}
The system pipes are inefficient because of serialization/deserialization of JSON each time when working with multiple jl-
utilities because each utility receives and returns JSON using stdin/stdout.
To solve this problem jl-
utilities supports internal piping which allows to multiple utilities interact with each other in the context of a single process without spending resources for parsing. To use internal pipes replace |
to \|
between jl-
commands in the command line. For example:
cat /tmp/test.json \
| jl-filter 'type == "buy"'
| jl-sum price
you can change to:
cat /tmp/test.json \
| jl-filter 'type == "buy"'
\| jl-sum price
CPU consumption of all utilities except for jl-sort
is O(n)
, by memory is O(1)
.
The performance of jl-sort
completely depends on system's sort
realization, additional CPU consumptions also O(n)
, and by memory O(1)
.
By using internal pipes significantly increases performance because JSON parsing is very expensive operation. But in another hand, all operations in pipes is computing in a single thread, so with very long pipes you should keep the balance.
Requires single argument - SQL and has few options:
-T DIR
- temporary directory which will be used for sorting if it didn't use RAM.$TMPDIR
or/tmp
by default. Pay attention when sorting big amount of data, by having/tmp
on the RAM disk.-S BUF
- size of sorting buffer in bytes. On buffer overflow, filesystem will be used(see-T
option)
Wrapper for GNU Sort, allowing to sort JSON. Uses all advantages of sort
, also, supports sorting in filesystem, merge-sort, stable sorting, limit buffer size.
Alternative to grep
, but for JSON.
General utility that produces welding elements into a single value for each group. Expects 3 JS-functions arguments:
-i FUNC
- accumulator initialization: gets called for each group. The function executes in a group context.-u FUNC
- accumulator update: the function that will be executed for each element of a group. The function executes in a group context and receives argumentr
, containing each element of a thread.-r FUNC
- received accumulator's value: will be executed at the end of group with the passed result. The function executes in a group context.
and one required argument
-k KEYDEF
- key which will be used for grouping, could be a function. For correct execution thread's input should be sorted by this key in any direction.
Here is simple example of using jl-sum
utility, which counts sum for amount
field for each user that gets identified by uid
field:
jl-reduce -k uid -i '{this.sum = 0}' -u '{this.sum += r.amount}' -r '{return this.sum}'
For thread's input containing:
{"uid": 1, "amount": 10}
{"uid": 1, "amount": 11}
{"uid": 2, "amount": 12}
{"uid": 3, "amount": 13}
Output will be:
{"key":1,"value":21}
{"key":2,"value":12}
{"key":3,"value":13}
If -k
argument will be missing, convolution will go over whole thread and will return:
{"value":46}
Predefined jl-reduce
, which counts sum of parameter in group
Predefined jl-reduce
, which counts amount of elements in group