JX is a tiny utility script to eXecute J commands on the output from *nix command-line pipes. The syntax is,
linux-command1 | linux-comand2 ... | jx `verb`
The `verb` first argument above is any J command that should be executed on the piped data (stdin
in J parlance). Typical use-cases are provided in the Examples section.
The following packages are automatically preloaded each time jx
is called,
- stats
- stats/distribs
- math/fftw
More libraries can easily be added to the beginning of the script. Check the addons page for further information.
Currently jx
can only deal with numeric data types. Use it on other types at your own risk!
J needs to be installed and available on the system. ijconsole
is used in jx
and is assumed to be in /usr/bin/ijconsole
. Replace the path in the script if this is not the case on your system.
This script has only been tested on WSL with Ubuntu 18.04.1 LTS (bionic).
J is a wonderful tool for rapid-prototyping. It is especially well suited for data analysis and transformations. It’s vast ecosystem and terse notation makes it especially suited for command-line data analysis. However, I could not find any solutions that run J commands on CSV type data.
The page on shell scripting provided an excellent starting point. I hope that this script benefits other beginners like me who are interested in leveraging the power of J. There are few languages that provide such potency in so few characters. Happy scripting!
Download the file called jx
and give it exec permissions (chmod +x
).
Alternatively, open the package.org file in Emacs, tangle to get the jx
file and give it exec permissions.
Ensure shell is set to bash
,
bash
A typical use case might be,
for i in $(seq 1 10); do echo $i; done | jx "*/"
3628800
Let’s consider a more realistic example using the classic iris
dataset (see accompanying .csv
for a copy). The excellent csvkit utility is used to extract information from the csv.
The dataset contains the following columns,
csvcut -n iris.csv
1:
2: Sepal.Length
3: Sepal.Width
4: Petal.Length
5: Petal.Width
6: Species
Columns 2-5 are numeric,
csvcut -c 2-5 iris.csv | head
Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
5.1,3.5,1.4,0.2
4.9,3,1.4,0.2
4.7,3.2,1.3,0.2
4.6,3.1,1.5,0.2
5,3.6,1.4,0.2
5.4,3.9,1.7,0.4
4.6,3.4,1.4,0.3
5,3.4,1.5,0.2
4.4,2.9,1.4,0.2
(+/ % #)
is the mean in J
. So to calculate the mean of the columns jx
is used as follows,
csvcut -c 2-5 iris.csv | tail -n +2 | ./jx "(+/ % #)"
5.84333,3.05733,3.758,1.19933
Since the stats addon is preloaded, various statistics could also be calculated as,
csvcut -c 2-5 iris.csv | tail -n +2 | jx "(#, mean, median, stddev)"
150,150,150,150
5.84333,3.05733,3.758,1.19933
5.8,3,4.35,1.3
0.828066,0.435866,1.7653,0.762238
Or using the dstat
(descriptive statistics),
csvcut -c 2-5 iris.csv | tail -n +2 | jx "dstat"
"sample size: 150","sample size: 150","sample size: 150","sample size: 150"
"minimum: 4.3","minimum: 2","minimum: 1","minimum: 0.1"
"maximum: 7.9","maximum: 4.4","maximum: 6.9","maximum: 2.5"
"median: 5.8","median: 3","median: 4.35","median: 1.3"
"mean: 5.84333","mean: 3.05733","mean: 3.758","mean: 1.19933"
"std devn: 0.828066","std devn: 0.435866","std devn: 1.7653","std devn: 0.762238"
"skewness: 0.311753","skewness: 0.315767","skewness: _0.272128","skewness: _0.101934"
"kurtosis: 2.42643","kurtosis: 3.18098","kurtosis: 1.60446","kurtosis: 1.66393"
There are many more functions in the addon. See the stats page for further details.
One of the major advantages of using jx
is that the entire J
ecosystem is available. This facilitates calculations not normally available in many other command-line statistical packages.
For example, the cumulative standard deviation is easily calculated as,
csvcut -c 2-5 iris.csv | tail -n +2 | jx "stddev \\" | head -n 10
0,0,0,0
0.141421,0.353553,0,0
0.2,0.251661,0.057735,3.39935e_17
0.221736,0.216025,0.0816497,0
0.207364,0.258844,0.0707107,0
0.288097,0.343026,0.13784,0.0816497
0.294392,0.313202,0.127242,0.0786796
0.274838,0.290012,0.119523,0.0744024
0.308671,0.316228,0.113039,0.0707107
0.291357,0.307137,0.108012,0.0788811
To calculate the fft
,
csvcut -c 2-5 iris.csv | tail -n +2 | jx "fftw" | head -n 10
876.5,458.6,563.7,179.9
_6.1619j53.5655,20.0516j_17.2772,_37.1193j145.149,_9.47561j62.0101
1.06414j32.9092,_7.05433j_9.32116,9.866j74.9139,3.54867j31.8348
_10.4029j_2.34397,_1.39779j_0.140479,_5.56232j_2.05679,1.27583j_0.313196
_3.82541j11.3016,1.11997j_3.65419,_10.001j32.7223,_0.713453j16.687
_7.80278j14.757,_6.34392j_0.147459,_1.59774j31.7484,_0.652352j17.7651
4.35145j_1.51585,0.4113j1.55174,4.72981j_2.98952,_0.141559j3.7895
_5.89409j6.59066,0.728365j_1.2031,_7.02916j14.2964,_3.42976j8.93904
0.639293j16.2784,_0.697862j2.37043,_2.56524j20.8029,_1.72254j8.12098
5.64899j_3.28078,6.46114j0.0480785,2.27507j_7.72158,0.337604j_1.55001
This is a tiny useful tool that should nicely supplement other solutions available today. Of course, there are many venues for expanding this. Please fork this repo and continue development!
This code is released under GPLv3.