prasxanth / jx

A tiny utility to eXecute J commands on output from *nix CLI pipes.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JX

JX is a tiny utility script to eXecute J commands on the output from *nix command-line pipes. The syntax is,

linux-command1 | linux-comand2 ... | jx `verb`

The `verb` first argument above is any J command that should be executed on the piped data (stdin in J parlance). Typical use-cases are provided in the Examples section.

Features

The following packages are automatically preloaded each time jx is called,

  1. stats
  2. stats/distribs
  3. math/fftw

More libraries can easily be added to the beginning of the script. Check the addons page for further information.

Currently jx can only deal with numeric data types. Use it on other types at your own risk!

Requirements

J needs to be installed and available on the system. ijconsole is used in jx and is assumed to be in /usr/bin/ijconsole. Replace the path in the script if this is not the case on your system.

This script has only been tested on WSL with Ubuntu 18.04.1 LTS (bionic).

Why?

J is a wonderful tool for rapid-prototyping. It is especially well suited for data analysis and transformations. It’s vast ecosystem and terse notation makes it especially suited for command-line data analysis. However, I could not find any solutions that run J commands on CSV type data.

The page on shell scripting provided an excellent starting point. I hope that this script benefits other beginners like me who are interested in leveraging the power of J. There are few languages that provide such potency in so few characters. Happy scripting!

Installation

Download the file called jx and give it exec permissions (chmod +x).

Alternatively, open the package.org file in Emacs, tangle to get the jx file and give it exec permissions.

Examples

Ensure shell is set to bash,

bash

A typical use case might be,

for i in $(seq 1 10); do echo $i; done | jx "*/"
3628800

Let’s consider a more realistic example using the classic iris dataset (see accompanying .csv for a copy). The excellent csvkit utility is used to extract information from the csv.

The dataset contains the following columns,

csvcut -n iris.csv
1:
  2: Sepal.Length
  3: Sepal.Width
  4: Petal.Length
  5: Petal.Width
  6: Species

Columns 2-5 are numeric,

csvcut -c 2-5 iris.csv | head
Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
5.1,3.5,1.4,0.2
4.9,3,1.4,0.2
4.7,3.2,1.3,0.2
4.6,3.1,1.5,0.2
5,3.6,1.4,0.2
5.4,3.9,1.7,0.4
4.6,3.4,1.4,0.3
5,3.4,1.5,0.2
4.4,2.9,1.4,0.2

(+/ % #) is the mean in J. So to calculate the mean of the columns jx is used as follows,

csvcut -c 2-5 iris.csv | tail -n +2 |  ./jx "(+/ % #)"
5.84333,3.05733,3.758,1.19933

Since the stats addon is preloaded, various statistics could also be calculated as,

csvcut -c 2-5 iris.csv | tail -n +2 |  jx "(#, mean, median, stddev)"
150,150,150,150
5.84333,3.05733,3.758,1.19933
5.8,3,4.35,1.3
0.828066,0.435866,1.7653,0.762238

Or using the dstat (descriptive statistics),

csvcut -c 2-5 iris.csv | tail -n +2 |  jx "dstat"
"sample size:       150","sample size:       150","sample size:        150","sample size:        150"
"minimum:           4.3","minimum:             2","minimum:              1","minimum:            0.1"
"maximum:           7.9","maximum:           4.4","maximum:            6.9","maximum:            2.5"
"median:            5.8","median:              3","median:            4.35","median:             1.3"
"mean:          5.84333","mean:          3.05733","mean:             3.758","mean:           1.19933"
"std devn:     0.828066","std devn:     0.435866","std devn:        1.7653","std devn:      0.762238"
"skewness:     0.311753","skewness:     0.315767","skewness:     _0.272128","skewness:     _0.101934"
"kurtosis:      2.42643","kurtosis:      3.18098","kurtosis:       1.60446","kurtosis:       1.66393"

There are many more functions in the addon. See the stats page for further details.

One of the major advantages of using jx is that the entire J ecosystem is available. This facilitates calculations not normally available in many other command-line statistical packages.

For example, the cumulative standard deviation is easily calculated as,

csvcut -c 2-5 iris.csv | tail -n +2 |  jx "stddev \\" | head -n 10
0,0,0,0
0.141421,0.353553,0,0
0.2,0.251661,0.057735,3.39935e_17
0.221736,0.216025,0.0816497,0
0.207364,0.258844,0.0707107,0
0.288097,0.343026,0.13784,0.0816497
0.294392,0.313202,0.127242,0.0786796
0.274838,0.290012,0.119523,0.0744024
0.308671,0.316228,0.113039,0.0707107
0.291357,0.307137,0.108012,0.0788811

To calculate the fft,

csvcut -c 2-5 iris.csv | tail -n +2 |  jx "fftw" | head -n 10
876.5,458.6,563.7,179.9
_6.1619j53.5655,20.0516j_17.2772,_37.1193j145.149,_9.47561j62.0101
1.06414j32.9092,_7.05433j_9.32116,9.866j74.9139,3.54867j31.8348
_10.4029j_2.34397,_1.39779j_0.140479,_5.56232j_2.05679,1.27583j_0.313196
_3.82541j11.3016,1.11997j_3.65419,_10.001j32.7223,_0.713453j16.687
_7.80278j14.757,_6.34392j_0.147459,_1.59774j31.7484,_0.652352j17.7651
4.35145j_1.51585,0.4113j1.55174,4.72981j_2.98952,_0.141559j3.7895
_5.89409j6.59066,0.728365j_1.2031,_7.02916j14.2964,_3.42976j8.93904
0.639293j16.2784,_0.697862j2.37043,_2.56524j20.8029,_1.72254j8.12098
5.64899j_3.28078,6.46114j0.0480785,2.27507j_7.72158,0.337604j_1.55001

Contributions

This is a tiny useful tool that should nicely supplement other solutions available today. Of course, there are many venues for expanding this. Please fork this repo and continue development!

This code is released under GPLv3.

About

A tiny utility to eXecute J commands on output from *nix CLI pipes.

License:GNU General Public License v3.0