pjs is a command-line tool for filtering and transforming text, similar to awk
. You provide it
powerful one-line snippets written in vanilla JavaScript. It supports many input formats, including
plain text, CSV, JSON, HTML, and XML.
pjs works by generating a complete JS program from the provided script, and feeding it each line of
standard input. The statically generated program can be reviewed with --explain
.
See the examples section below to see what pjs can do. For complete documentation, read
the manual or run man pjs
.
Install the pjs
command with npm
:
npm install -g pjs-tool
If npm
is not available on your environment, you can download a standalone
executable. You will still need node
installed.
Click on an example to run it in your browser at the pjs playground.
Convert a file to upper-case:
cat input.txt | pjs '_.toUpperCase()'
Print the second field of each line (in this example, the PIDs):
Print all fields after the 10th (in this example, the process names):
ps aux | pjs '$.slice(10).join(" ")'
Remove trailing whitespace from each line in a file:
cat document.txt | pjs '_.replace(/\s*$/, "")'
Given a list of numbers, print only numbers greater than 5:
Given a list of numbers, print only even numbers:
Print the last 4 lines of a file (like tail
):
seq 1 10 | pjs --after 'LINES.slice(-4).join("\n")'
Print every other line of a file:
cat input.txt | pjs 'COUNT % 2 == 1'
Given a list of filenames, print the files that actually exist:
cat filenames.txt | pjs 'fs.existsSync(_)'
Given a list of filenames, print the files that are under one kilobyte in size:
cat filenames.txt | pjs 'fs.statSync(_).size < 1000'
Manually count the lines in the input (like wc -l
):
cat input.txt | pjs '{ count++ }' --after 'count'
Same as above, but using the built-in COUNT
variable:
cat input.txt | pjs --after 'COUNT'
Count the unique lines in the input:
cat input.txt | pjs --before 'let s = new Set()' '{ s.add(_) }' --after 's.size'
Manually sort the lines of the input (like sort
)
cat input.txt | pjs --before 'let lines = []' '{ lines.push(_) }' --after 'lines.sort().join("\n")'
Same as above, but using the built-in LINES
variable:
cat input.txt | pjs --after 'LINES.sort().join("\n")'
Given a grades.csv
file that looks like this:
name,subject,grade
Bob,physics,43
Alice,biology,75
Alice,physics,90
David,biology,85
Clara,physics,78
Print only the third column:
cat grades.csv | pjs --csv '$2'
Print the grades using the column header:
cat grades.csv | pjs --csv-header '_.grade'
Print the names of students taking biology:
cat grades.csv | pjs --csv-header '_.subject == "biology" && _.name'
Print the average grade across all courses:
cat grades.csv | pjs --csv-header '{ sum += Number(_.grade) }' --after 'sum/COUNT'
Given a users.json
file that looks like this:
{
"version": 123,
"items": [
{"name": {"first": "Winifred", "last": "Frost"}, "age": 42},
{"name": {"first": "Miles", "last": "Fernandez"}, "age": 15},
{"name": {"first": "Kennard", "last": "Floyd"}, "age": 20},
{"name": {"first": "Lonnie", "last": "Davis"}, "age": 78},
{"name": {"first": "Duncan", "last": "Poole"}, "age": 36}
]
}
Print the value of the "version" field:
cat users.json | pjs --json '.version' _
Print the full name of each user:
cat users.json | pjs --json '.items[].name' '_.first+" "+_.last'
Print the users that are older than 21:
cat users.json | pjs --json '.items[]' '_.age >= 21'
Print the ages of the first 3 users only:
cat users.json | pjs --json '.items[0:3]' '_.age'
Query a web API for users:
curl -A "" 'https://www.instagram.com/web/search/topsearch/?query=John' |
pjs --json '.users[].user' '`@${_.username} (${_.full_name})`'
Print the text of all <h1>
and <h2>
elements on a web page:
cat page.html | pjs --html 'h1,h2' '_.text'
Print the URLs of all images on a web page:
cat page.html | pjs --html 'img' '_.attr.src'
Scrape headlines off a news site using a complex CSS selector:
curl https://news.ycombinator.com | pjs '_.text' \
--html 'table table tr:nth-last-of-type(n+2) td:nth-child(3)'
Print all links in <h2>
elements with URLs containing the word "blog":
curl https://aduros.com | pjs --html 'h2 a' '_.attr.href.includes("blog") && _.attr.href'
Print a readable summary of an RSS feed:
curl https://aduros.com/index.xml | pjs --xml 'item' \
'_.querySelector("title").text + " --> " + _.querySelector("link").text'
Bulk rename *.jpeg files to *.jpg:
find -name '*.jpeg' | pjs 'let f = path.parse(_);
fs.renameSync(_, path.join(f.dir, f.name+".jpg"))'
Print the longest line in the input:
cat input.txt | pjs 'if (_.length > m) { m = _.length; longest = _ }' --after 'longest'
Count the words in the input:
cat input.txt | pjs '{ words += $.length }' --after 'words'
Count the unique words in the input:
Using a script file instead of command-line arguments:
echo '
BEFORE: {
print("Starting up!")
}
_.toUpperCase()
AFTER: "Total lines: "+COUNT
' > my-uppercase.js
cat document.txt | pjs -f my-uppercase.js
Adding a shebang to the above script to make it self-executable:
echo "#!/usr/bin/env -S pjs -f" | cat - my-uppercase.js > my-uppercase
chmod +x my-uppercase
./my-uppercase document.txt
Completely scrape an entire online store, outputting a JSON stream for later processing:
for page in `seq 1 50`; do
>&2 echo "Scraping page $page..."
curl -s "http://books.toscrape.com/catalogue/page-$page.html" |
pjs --html '.product_pod h3 a' '"http://books.toscrape.com/catalogue/"+_.attr.href' |
while read url; do
>&2 echo "Scraping item details from $url"
curl -s "$url" | pjs --html '.product_page' 'JSON.stringify({
title: _.querySelector(".product_main h1").text,
description: _.querySelector("#product_description + p").text})'
done
done