Alternative function calling conventions for use in Bash
Both functions
and programs
in Bash are commands
that accept input and produce output in pretty much the same way. There are alternatives to this, but this is how it generally works:
(returnCode,stdout,stderr) = command(arguments,stdin,env)
You may feed arguments to the command. For example, you may feed the -l
argument to the ls
program:
$ ls -l
total 28
-rw-r--r-- 1 ontop ontop 3438 May 25 13:46 altcalconv.sh
...
The argument -l
tells the ls
program to use the long format for files and subfolders, and to display privileges and ownership for them.
You may feed an input stream to the command. For example:
$ echo hello | sed 's/ll/xx/'
hexxo
We feed the hello
string as a stream to the sed
command, asking it to replace the characters 'll' by 'xx' in its input.
You may feed environment variables to the command. For example:
$ env DISPLAY=:0 xeyes
The program will terminate with an integer value. Conventionally, this will either be 0 in case of success, and 1,2, or another non-zero integer value in case of failure. In the following example, we see that the grep
program returns 0 when it has found a particular pattern in its search:
$ grep transmit altcalconv.sh ; echo "returned: $?"
function transmit {
returned: 0
The return code of the last command executed is available in the variable $?
. In case, it cannot find the pattern, it returns 1:
$ grep alan altcalconv.sh ; echo "returned: $?"
returned: 1
The normal output stream for the program is stdout
. For example, the echo
command will output its arguments to stdout
:
$ echo hello
hello
The error messages for the program go to stderr
. For example, when ls
cannot find a file, it will output:
$ ls some-file-not-there.txt ; echo "returned: $?"
ls: cannot access some-file-not-there.txt: No such file or directory
returned: 2
Note that both stdout
and stderr
are both dumped in the terminal. So, you tend to see both intermixed, even though they are separate streams. If we mute stderr
, you can see:
$ ls some-file-not-there.txt 2> /dev/null ; echo "returned: $?"
returned: 2
One not so good but widespread habit in shell programming, is to forget looking at the return code for a command or to forget to handle errors. Many shell scripts just continue with the next command if an error has occurred. Quite often, success of the previous command was really needed for the next command. Otherwise, why execute such command, if it does not matter that it went right? In that case, spare yourself the trouble and do not execute it at all, I would say.
Outside programs, even on different systems could execute a command and desire to know what in what status it ended up:
command arg arg ...
if ! success; then
handleError errorMessage
fi
nextCommand arg arg ...
nextCommand arg arg ...
In case of success, the program should just continue and use the command's output. In case of failure, it should take a different route and take into account the command's error message on stderr.
In order to facilitate program behaviour that effectively handles error conditions, the altcalconv.sh
script implements the capture
function:
#!/usr/bin/env bash
source altcalconv.sh
function mycommand {
echo "whatever $1 to stdout"
stderr "whatever $1 to stderr"
return 42
}
source <(capture ret out err := mycommand "hello friends")
echo "ret:$ret"
echo "out:$out"
echo "err:$err"
output:
ret:42
out:mycommand hello friends to stdout
err:mycommand hello friends to stderr
The capture
function will capture the output of the mycommand "hello friends"
command into three variables of which you can choose the names.
By the way, the expression:
source <(capture ret out err := mycommand "hello friends")
and:
eval $(capture ret out err := mycommand "hello friends")
are equivalent.
However, the eval
version will produce better error messages in case of issues.
But then again, since the Church of the Anti-Eval Fanatics insists that eval
is evil, while they have never successfully managed to also demonize the use of the source
command (that would probably be another church with another doctrine), you may still want to use source
instead of eval
, and in that way avoid embarrassing accusations of heresy.
In the following example, the capture
function will inject local variables in a function, instead of global ones:
function myfunction {
source <(capture local returnCode output errors := mycommand "hello friends")
if equal $returnCode 0 ; then
echo "success"
echo "this is the output: $output"
else
echo "failure, these are the error messages: $errors"
fi
}
Note that Bash does not allow for the use the keyword local
outside function bodies. Therefore, injecting local variables into the global namespace will lead to Bash reprimanding you.
Sometimes, you may wish that you could use a more traditional way of applying functions in Bash. The transmit
and assign
combo allows you to do this. For example:
#!/usr/bin/env bash
source altcalconv.sh
function func2 {
transmit 4 3 12 $(((99+$1)))
}
eval $(assign x1 x2 x3 x4 := func2 53)
echo "x1:$x1"
echo "x2:$x2"
echo "x3:$x3"
echo "x4:$x4"
output:
x1:4
x2:3
x3:12
x4:152
You can obviously also use the alternative syntax that is based on process substitution:
source <(assign x1 x2 x3 x4 := func2 53)
This 'function result transmission mechanism' quite simulates how other programming languages return results from the callee to the caller.
The transmit
function creates a (temporary) stack, and pushes the values transmitted onto this stack. The assign
function pops these results from the stack, assigns them to the variables mentioned, and then clears the stack.
Of course, it does not simulate the practice in truly native programs to use CPU registers as the top locations of the stack, in order to speed up the calling convention.
Since the calling convention triangulates over exactly one global stack data structure, just like in the real world, it is not thread safe, where each thread must also have its own stack.
By the way, contrary to popular belief, it is most likely possible to use threads in Bash. You could try with ctypes.sh to load the pthread library, and use its functions to control your threads. If you intend to do that, you will have to modify the implementation of the altcalconv.sh _pid()
function to take into account the thread identifier. From there on, it should be thread safe.
The assign function can also inject local variables instead of global ones. For example:
function myfunction {
eval $(assign local x1 x2 x3 x4 := func2 53)
}
Seen from the outside, an external program and a function look the same to their users. Unless you try to figure it out, you cannot know if a command has been implemented as a function or as an external program. The advantage of this policy is that programs and functions are (almost) perfectly interchangeable.
Therefore, the classical function calling convention in Bash certainly has its unique advantages.
A disadvantage of this policy is that functions in Bash do not work like functions in other programming languages.
At the basis, I very much like the classical policy, because it potentially allows for an error-handling style that is superior to traditional exception handling. The only problem is that you have to take tight control over the command's output, like with the capture
function. If you don't do that, error handling could actually turn out to be worse than in an exception-handling context. So, the approach indeed has much better potential, but you will still have to make it happen.
Since everything revolves around processes in Bash, just like in the underlying OS itself, Bash has the advantage that it will automatically enlist all your machine cores to execute your program when it would be beneficial to do so. There is no need to use external libraries or commands to schedule co-routines or to spread the load across different CPUs.
Incessant process creation indeed causes overhead, but so does function call setup in other scripting languages. It is not that this would be for free either. Furthermore, in Bash, it is trivially easy to distribute processes across different machines across the internet. Instead of writing:
command arg1 arg2 arg3 ...
Just write:
ssh user@server command arg1 arg2 arg3 ...
I personally think that Bash is badly underrated.
I consider it to be a valid substitute for other scripting languages such as perl, python, php, lua, or javascript. For all practical purposes, its functions are first class. At its core, it uses a pure list notation:
command arg1 arg2 arg3 ...
allowing for nested expressions through the use of different types of parentheses:
command1 arg11 $(command2 arg21 arg22 arg23 ... ) arg13 arg14 ...
with the command substitution parenthesis type, $()
, being clearly the most important one, since command output on stdout
is rightfully considered to be the most important one.
Fixing Bash, is mostly a question of just adding a few sanitizing functions to bury its sometimes strange notational impurities behind a purer list notation, and to suppress unnecessary syntactic noise. For example, I do not use:
if [ -z $string ] ; then
...
fi
The standard bracketing is too noisy to my taste. Furthermore, I reject the conceptual burden of remembering what -z
may mean. I just don't. I refuse to be bullied. Therefore, I have implemented a wrapper, that causes the code to look like this:
if empty $string ; then
...
fi
I prefer the looks of this kind of notational purity. It is a quiet syntax, and self-evident for that matter. Unfortunately, the then
keyword is not optional. It is mandatory, even though it is redundant. The language would be perfectly unambiguous without:
if list
expression1
expression2
...
fi
or:
if list; expression1 ; expression2; ...; fi
Therefore, the then
keyword is one of the few unfortunate, mandatory impurities in the Bash language grammar.
Out of the box, the Euler notation typically in use in other scripting languages:
f(x1,x2,x3)
Is much more noisy than the quiet list notation in use in Bash:
f x1 x2 x3
Chaining function applications, is much cleaner in list notation than in Euler notation:
g(f(x1,x2,x3))
versus:
g f x1 x2 x3
or:
g $(f x1 x2 x3)
if g
happens to take more arguments than just a list, or if the arguments to g
are generally supposed to be evaluated already.
Lots of issues can be solved just by prepending an additional function to the list. A typical incantation in Bash:
command arg1 arg2 arg3 &> /dev/null
Can easily be made much quieter by implementing something like a shutUp
function, and replace the expression above by:
shutUp command arg1 arg2 arg3
Such shutUp
function, that can also handle input on stdin, could look like this:
function shutUp {
if test -s /dev/stdin ; then
cat /dev/stdin | "$@" &> /dev/null
else
"$@" &> /dev/null
fi
}
As you can see, the shutUp
function concentrates syntactic noise that would otherwise just run loose in your own program. In fact, source code written in Bash can be very much sanitized to the point where only few notational impurities are left, along with the occasional unnecessary conceptual burden.
That can certainly produce a rather pleasantly quiet impression in Bash source code, in fact, much quieter than in other scripting languages, of which the noise of their Eulerian notation is fundamentally beyond repair.
Seeking to establish more notational purity in Bash, would certainly contribute to unleashing its amazing true potential.