aduros / pjs

An awk-like command-line tool for processing text, CSV, JSON, HTML, and XML.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pjs

pjs is a command-line tool for filtering and transforming text, similar to awk. You provide it powerful one-line snippets written in vanilla JavaScript. It supports many input formats, including plain text, CSV, JSON, HTML, and XML.

pjs works by generating a complete JS program from the provided script, and feeding it each line of standard input. The statically generated program can be reviewed with --explain.

See the examples section below to see what pjs can do. For complete documentation, read the manual or run man pjs.

Installing

Install the pjs command with npm:

npm install -g pjs-tool

If npm is not available on your environment, you can download a standalone executable. You will still need node installed.

Examples

Click on an example to run it in your browser at the pjs playground.

Transforming Examples

Convert a file to upper-case:

cat input.txt | pjs '_.toUpperCase()'

Print the second field of each line (in this example, the PIDs):

ps aux | pjs '$1'

Print all fields after the 10th (in this example, the process names):

ps aux | pjs '$.slice(10).join(" ")'

Remove trailing whitespace from each line in a file:

cat document.txt | pjs '_.replace(/\s*$/, "")'

Filtering Examples

Given a list of numbers, print only numbers greater than 5:

seq 1 10 | pjs '_ > 5'

Given a list of numbers, print only even numbers:

seq 1 10 | pjs '_ % 2 == 0'

Print the last 4 lines of a file (like tail):

seq 1 10 | pjs --after 'LINES.slice(-4).join("\n")'

Print every other line of a file:

cat input.txt | pjs 'COUNT % 2 == 1'

Given a list of filenames, print the files that actually exist:

cat filenames.txt | pjs 'fs.existsSync(_)'

Given a list of filenames, print the files that are under one kilobyte in size:

cat filenames.txt | pjs 'fs.statSync(_).size < 1000'

Summarizing Examples

Manually count the lines in the input (like wc -l):

cat input.txt | pjs '{ count++ }' --after 'count'

Same as above, but using the built-in COUNT variable:

cat input.txt | pjs --after 'COUNT'

Count the unique lines in the input:

cat input.txt | pjs --before 'let s = new Set()' '{ s.add(_) }' --after 's.size'

Manually sort the lines of the input (like sort)

cat input.txt | pjs --before 'let lines = []' '{ lines.push(_) }' --after 'lines.sort().join("\n")'

Same as above, but using the built-in LINES variable:

cat input.txt | pjs --after 'LINES.sort().join("\n")'

CSV Examples

Given a grades.csv file that looks like this:

name,subject,grade
Bob,physics,43
Alice,biology,75
Alice,physics,90
David,biology,85
Clara,physics,78

Print only the third column:

cat grades.csv | pjs --csv '$2'

Print the grades using the column header:

cat grades.csv | pjs --csv-header '_.grade'

Print the names of students taking biology:

cat grades.csv | pjs --csv-header '_.subject == "biology" && _.name'

Print the average grade across all courses:

cat grades.csv | pjs --csv-header '{ sum += Number(_.grade) }' --after 'sum/COUNT'

JSON Examples

Given a users.json file that looks like this:

{
  "version": 123,
  "items": [
    {"name": {"first": "Winifred", "last": "Frost"}, "age": 42},
    {"name": {"first": "Miles", "last": "Fernandez"}, "age": 15},
    {"name": {"first": "Kennard", "last": "Floyd"}, "age": 20},
    {"name": {"first": "Lonnie", "last": "Davis"}, "age": 78},
    {"name": {"first": "Duncan", "last": "Poole"}, "age": 36}
  ]
}

Print the value of the "version" field:

cat users.json | pjs --json '.version' _

Print the full name of each user:

cat users.json | pjs --json '.items[].name' '_.first+" "+_.last'

Print the users that are older than 21:

cat users.json | pjs --json '.items[]' '_.age >= 21'

Print the ages of the first 3 users only:

cat users.json | pjs --json '.items[0:3]' '_.age'

Query a web API for users:

curl -A "" 'https://www.instagram.com/web/search/topsearch/?query=John' |
    pjs --json '.users[].user' '`@${_.username} (${_.full_name})`'

HTML/XML Examples

Print the text of all <h1> and <h2> elements on a web page:

cat page.html | pjs --html 'h1,h2' '_.text'

Print the URLs of all images on a web page:

cat page.html | pjs --html 'img' '_.attr.src'

Scrape headlines off a news site using a complex CSS selector:

curl https://news.ycombinator.com | pjs '_.text' \
    --html 'table table tr:nth-last-of-type(n+2) td:nth-child(3)'

Print all links in <h2> elements with URLs containing the word "blog":

curl https://aduros.com | pjs --html 'h2 a' '_.attr.href.includes("blog") && _.attr.href'

Print a readable summary of an RSS feed:

curl https://aduros.com/index.xml | pjs --xml 'item' \
    '_.querySelector("title").text + " --> " + _.querySelector("link").text'

Advanced Examples

Bulk rename *.jpeg files to *.jpg:

find -name '*.jpeg' | pjs 'let f = path.parse(_);
    fs.renameSync(_, path.join(f.dir, f.name+".jpg"))'

Print the longest line in the input:

cat input.txt | pjs 'if (_.length > m) { m = _.length; longest = _ }' --after 'longest'

Count the words in the input:

cat input.txt | pjs '{ words += $.length }' --after 'words'

Count the unique words in the input:

cat input.txt | pjs --before 'let words = new Set()' 'for (let word of $) words.add(word)' --after 'words.size'

Using a script file instead of command-line arguments:

echo '
    BEFORE: {
        print("Starting up!")
    }
    _.toUpperCase()
    AFTER: "Total lines: "+COUNT
' > my-uppercase.js

cat document.txt | pjs -f my-uppercase.js

Adding a shebang to the above script to make it self-executable:

echo "#!/usr/bin/env -S pjs -f" | cat - my-uppercase.js > my-uppercase
chmod +x my-uppercase

./my-uppercase document.txt

Completely scrape an entire online store, outputting a JSON stream for later processing:

for page in `seq 1 50`; do

    >&2 echo "Scraping page $page..."
    curl -s "http://books.toscrape.com/catalogue/page-$page.html" |
        pjs --html '.product_pod h3 a' '"http://books.toscrape.com/catalogue/"+_.attr.href' |

        while read url; do
            >&2 echo "Scraping item details from $url"
            curl -s "$url" | pjs --html '.product_page' 'JSON.stringify({
                title: _.querySelector(".product_main h1").text,
                description: _.querySelector("#product_description + p").text})'
        done
done

About

An awk-like command-line tool for processing text, CSV, JSON, HTML, and XML.

License:ISC License


Languages

Language:JavaScript 74.6%Language:Shell 22.5%Language:Makefile 2.9%