All the broken
Too many broken
Shells
In our shellmounds
— Grayceon, Shellmounds

Build Your Own Shell

This is the material for a series of workshops I ran at my workplace on how to write a Unix shell.

The focus is slightly more on building an interactive shell than a scripting-oriented shell, only because I think this is more gratifying, even if it's less useful.

Be warned that some of the suggestions and discussion make opinionated choices without discussing equally-valid alternatives.

This is a work in progress and there may remain many infelicities. Patches Thoughtfully Considered. Feel free to report issues via Github.

Why write your own shell?

The shell is at the heart of Unix. It's the glue that makes all the little Unix tools work together so well. Understanding it sheds light on many of Unix's important ideas, and writing our own is the best path to that understanding.

This workshop has three goals:

to give you a better understanding of how Unix processes work;
- this will make you better at designing and understanding software that runs on Unix;
to clarify some common misunderstandings of POSIX shells;
- this will make you more effective at using and scripting ubiquitous shells like bash;
to help you build a working implementation of a shell you can be excited about working on.
- there are endless personal customizations you can make to your own shell, and can help you think about how you interact with your computer and how it might be different.

(some of this rationale is expanded on in my blog post, Building shells with a grain of salt)

How to use this repository

I've tried to break this up into progressive stages that cover mostly orthogonal topics. Each stage contains a description of the facilities that will be discussed, a list of manpages to consult, and a set of tests. I've tried to also hint at some functionality that is fun but not necessary for the tests to pass.

In the root of this repository, there is a script called validate; you can run all the tests against your shell-in-progress by specifying the path to your shell's executable, like this:

$ ./validate ../mysh/mysh

It should tell you what stage you need to implement next. You can also run a stage by itself, or an individual test:

$ ./validate ../mysh/mysh stage_2
$ ./validate ../mysh/mysh stage_3 03

To run the tests, you will need expect, which is usually in a package called expect, and a C compiler. The way the tests are implemented is less robust than one might hope, but should suffice for our pedagogical goals. They are unfortunately somewhat timing sensitive, such that some tests will be flaky. If you encounter a specifically flaky test, please let me know.

The tests assume you will be implementing a vanilla Bourne-flavored shell with some ksh influences. Feel free to experiment with alternate syntax, but if so, you may need to adjust the tests. Except where specifically noted, bash (and ksh) should pass all the tests, so you can "test the tests" that way. (Try ./validate /bin/bash.) Likewise, cat should fail all the tests.

Originally, I targeted plain /bin/sh, but I decided the material in stage 5 was too important. Still, dash will pass everything but stage 5. There are also some other minor compatibility differences with some existing shells; you may run into them if you try them out. Any failure (of a supposedly POSIX shell) that isn't documented in the comments of a test should be reported as a bug.

Stages

1: fork/exec/wait

In which we discuss the basics of Unix processes, write the simplest possible shell, and then lay the foundations for the rest of the steps.

2: files and pipes

In which we add pipes and fd redirection to our shell.

3: job control and signals

In which we discuss signals and add support for ever-helpful chords like ^C, ^\, and ^Z.

4: quoting and expansion

In which we discuss environments, variables, globbing, and other oft-misunderstood concepts of the shell.

5: interactivity

In which we apply some polish to our shell to make it usable for interactive work.

&: where to go next

In which I prompt you to go further.

Shells written from this workshop

I'll link to some of the shells that were written as a result of this workshop here shortly, including a couple I wrote to serve as examples of different approaches.

Supplementary Material

Tools

The [shtepper] is a great resource for understanding shell execution. You can input an expression and see in excruciating detail how it should be evaluated.

shtepper

Documents

Advanced Programming in the Unix Environment by Stevens covers all this stuff and is a must-read. I call this APUE throughout this tutorial.
Chet Ramey describes the Bourne-Again Shell in the Architecture of Open Source Applications; this is probably the best thing to read to understand the structure of a real shell.
Michael Kerrisk's the Linux Programming Interface, though fairly Linux-specific, has some great coverage of many of the topics we'll touch on. I call this LPI throughout this tutorial.
Unix system programming in OCaml shows the development of a simple shell.
Advanced Unix Programming by Rochkind; chapter 5 has a simple shell.
the tour of the Almquist shell is outdated but may help you find where some things are implemented in dash and other ash descendants.

References

the POSIX standard explains the expectations for the shell and its utilities in reasonable detail.
there are POSIX conformance test suites but they don't seem to be available in convenient, non-restricted forms.
yash's posix-shell-tests are only runnable with yash, but the tests themselves are full of useful ideas.

Shells to Examine

busybox: C; contains both ash and hush, and test suites.
mksh: C; non-interactive tests.
rc: C; fairly minimal.
zsh: C; extremely maximal.
bash: C.
fish: C++11; has expect-based interactive tests.
Thompson shell: C; the original Unix shell; very minimal.
scsh: Scheme and C; intended for scripting.
cash: OCaml; based on scsh.
eshell: Emacs Lisp.
oil: Python and C++; has an extensive test suite.
xonsh: Python.
oh: Go.
yash-rs: Rust.

Links to Resources by Language

Although there is an elegant relationship between C and Unix which makes it attractive to write a shell in the former, to minimize frustration I suggest trying a higher-level language first. Ideally the language will have good support for:

making POSIX syscalls
string manipulation
hash tables

Languages that provide a lot of their own infrastructure with regards signals or threads may be much more difficult to use.

C++

http://basepath.com/aup/ex/group__Ux.html

Common Lisp

The most convenient library would be iolib, which you can get through Quicklisp. You'll need to install libfixposix first. There's also sb-posix in sbcl for the daring.

Haskell

use the unix package
Hell might be a starting point

Java / JVM-based languages

You will probably run into issues related to the JVM, particularly with signals and forking, but as a starting point, you could do worse than loading libc with JNA.

There's also jtux.

Lua

There are a variety of approaches, but ljsyscall looks promising. luaposix might be sufficient.

OCaml

perl

See perlfunc(3perl); all the functions we want are at hand, usually with the same name.

Python

Although Python provides higher-level abstractions like subprocess, for the purposes of this workshop you probably want to use the functions in os.

Please note an important gotcha for stage 2! Since Python 3.4, fds have defaulted to non-inheritable, which means you'll need to explicitly os.set_inheritable(fd, True) any file descriptor you intend to pass down to a child.

Racket

The implementation seems a little too heavy to do this conveniently, but see the Scheme section below for alternatives.

Ruby

Process has most of what you need. You can use Shellwords but you decide if it's cheating or not.

Rust

Although we use few enough calls that you could just create bindings directly, either to libc with the FFI or by directly making syscalls, for just getting something working, the nix-rust library should provide all the necessary facilities.

Scheme

Guile already has all the calls you need; see the POSIX section of the Guile manual. Another approach would be to use something like Chibi Scheme with bindings to libc calls.

Tcl

Although core Tcl doesn't provide what's necessary, expect probably does. For example, Tcl doesn't have a way to exec, but expect provides overlay to do this.

tokenrove / build-your-own-shell