ebiggers / mysh

A simple shell

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mysh (name pending) is a relatively simple shell that supports some features of
a POSIX-compliant shell.

Shell input
===========

The shell can take input in one of the following ways:

- from standard input, which is done if the shell is executed with no non-option
  arguments or is executed with the '-s' option.  In the former case, the
  shell is considered interactive and will print a prompt before each line of
  input.
- directly from the command line, which is done if the '-c' option is passed.
- from a shell script, which is done if the shell is executed with a non-option
  argument, which is interpreted as the same of a shell script to execute.

Shell statements
================

The input takes the form of zero or more shell statements.  A shell statement is
terminated by a semicolon ';', a newline '\n', or a pound sign '#', the last of
which causes the rest of the line to be ignored as a comment.  A newline may be
escaped with '\' if it occurs unquoted or inside double quotes, in which case
the newline is ignored.

Each shell statement is a pipeline which consists of zero or more pipeline
components separated by the '|' character.  The pipeline may be followed by an
optional '&' character, which indicates that the pipeline is to be executed
asynchronously.  Each component of the pipeline consists of zero or more
variable assignments, followed by one or more words that specify the command to
execute and its arguments, followed by zero or more redirections.

Example shell input showing some supported features:

$ cat myfile.txt | grep "$pattern" | wc -l > wc.out 2> wc.err; echo 555 &; echo a\
string\
containing\
newlines > outfile; NEWVAR=value env

Asynchronous commands
=====================

Follow a pipeline with the '&' character to execute it asynchronously.  Its job
number and the pid of the last command in the pipeline will be printed.
Multiple concurrent background pipelines are supported.  The shell will check if
any background pipelines have terminated after each additional line of input
and will print their job numbers and pids again if so.  Real job control is not
supported, so you cannot actually refer to background pipelines by their job
numbers.

Shell variables and parameter expansion
=======================================

Shell variables can be set using variable assigments of the form

$ VAR=VALUE

If a shell statement consists only of variable assignments, the variables are
set in the currently executing shell, but not yet exported into the environment
unless done so using the 'export' builtin.  If the variable assignments occur
preceding an external command to be executed, the variable assignments are made
only in the environment of the child process for the external command.

Shell variables can be unset using the 'unset' builtin:

$ unset VAR

When the shell is started, the environment is loaded into shell variables.
Also, the positional parameters ($0, $1, $2, ...) are set based on any
arguments given to the shell on the command line.  All other variables are
initially unset.  The positional parameters can also be changed by the 'shift'
builtin (to make $1 become $2, $2 become $3, etc.) or by the 'set' builtin (to
provide a new set of positional parameters).

Parameter expansion is done, so shell variables can be used in shell statements.
Parameter expansion takes the form of $VAR or ${VAR}, and it is done in unquoted
and double-quoted strings, but not single-quoted strings.  A few special
parameters are supported:

  $0, $1, ..., etc.  Positional parameters
  $#                 Number of positional parameters
  $*, $@             All positional parameters, space-separated
  $$                 Current process ID
  $?                 Exit status of last executed command
  $!                 Process ID of last component of last executed background
                         pipeline, or unset if no background pipelines have been
                         executed in the current shell.

Note: $@ currently does not behave correctly when used in a double-quoted string
(compared to a POSIX-compliant shell).  Also, the $IFS variable is not
supported.

Some other parameter expansions required by a POSIX-compliant shell are not
recognized; for example, ${VAR:-DEFVAL} is not recognized and will just not be
expanded at all.

Word splitting and gluing
=========================

If you do something like:

$ a="1 2"
$ echo $a

$a will expand to two strings and this occurs outside of double quotes, so
'echo' will be passed two separate arguments "1" and "2".  This is word
splitting.

If you do something like:
$ echo "1"'2'

The strings "1" and '2' will be joined into the string "12", so 'echo' will be
passed one argument "12".  This is word gluing (there may be another name for
this...).

Filename expansion
==================

Unquoted strings passed to the shell are subject to filename expansion using the
glob() function, where the '*' character expands to any characters other than
'/', the '?' character expands to exactly one unspecified character, and
[[!]abc] expands to exactly one of the characters a, b, or c (without '!') or
exactly one character other than a, b, and c (with '!').

globs that do not expand to any files are left unchanged.  glob special
characters can otherwise be escaped with the backslash character or put in
quotes.

Filename expansion occurs after parameter expansion, word splitting, and word
gluing.

Tilde expansion
===============

Tilde expansion (where ~ is replaced with $HOME, and ~USER is replaced with the
home directory of USER) is partially supported by making the glob() function do
it at the same time as filename expansion.  However, it will not work correctly
if the named file does not exist, as that apparently causes glob() to return the
glob literally without performing tilde expansion.

Redirections
============

A number of redirection operators are supported:

- Redirect file descriptor N (defaults to standard output) to a file:
        [N]>FILENAME

- Redirect file descriptor N (defaults to standard output) to a file, but start
  writing at the end of the file rather than overwriting it:
        [N]>>FILENAME

- Redirect both standard output and standard error to a file:
        &>FILENAME

- Append both standard output and standard error to a file:
        &>>FILENAME

- Make file descriptor N, which defaults to standard output, be a copy of file
  descriptor M:
        [N]>&M

- Redirect a file to an input file descriptor, defaulting to standard input:
        [N]<FILENAME

- Make file descriptor N, which defaults to standard input, be a copy of file
  descriptor M:
        [N]<&M

Redirections are made only for the preceding command, unless the 'exec' builtin
is used with no arguments, in which case any redirections take place in the
currently executing shell:

    $ exec &> file  # append stdout and stderr to a file for the remainder of the script

Redirections currently have the following limitations:

- Redirections must be listed after any command arguments.  So,

    $ echo > file 1

  must be written as:

    $ echo 1 > file

- Here documents and here strings are not supported.

- The special-case of >& with no specified file descriptors to mean &> is not
  supported.

- The >| 'noclobber' redirection is not supported.

- Opening file descriptors (with <>) and closing file descriptors (with >&-) is
  not supported.

Control statements
==================

Control statements such as 'if', 'for', and 'case' are not supported.

Functions and aliases
=====================

Shell functions are not supported.  Aliases are partially supported; make them
with the 'alias' builtin.  There are some limitations; for example, aliases are
not expanded recursively.  Alias are only expanded in interactive mode (this is
the correct behavior and not a limitation.)  You can remove aliases using the
'unalias' builtin.

Command expansion
=================

Command expansion with `COMMAND` or $(COMMAND) is not supported.

Arithmetic expansion
====================

Arithmetic expansion with $(( EXPR )) is not supported.

Subshells and command grouping
==============================

Subshells with ( INPUT ) and command grouping with { INPUT } are not supported.

Shell options
=============

The 'set' builtin allows the options 'f', 'e', 'n', and 'v' to be set (with
-OPT) and unset (with +OPT).  Other POSIX-required options are not supported,
and the options cannot be set directly from the command line.  You also
cannot use 'set -o OPTION' to set an option by name.  The descriptions of
the options are:

  -e:  Exit the shell when a command exits with failure status.
  -f:  Disable filename expansion.
  -n:  Parse, but do not execute, the input.
  -v:  Print input as it's executed.

Builtins
========

Some builtins have been described already.  The full list of builtins is:

. filename [arguments ...]
:
alias [name=value ...]
cd [DIR]
eval [arg ...]
exec [command [arguments ...]]
exit [STATUS]
export VARIABLE[=VALUE] ...
getenv [VARIABLE]
help [COMMAND]
pwd
set [(-+)efnv] [--] [arg ...]
setenv VARIABLE [VALUE]
shift [N]
source filename [arguments ...]
unalias [name ...]
unset [name ...]

Limitation: the output from a builtin currently cannot be piped into another
command.  ('echo' works only because it hasn't been implemented as a builtin).

Command line editing
====================

Readline is supported for interactive use.  Support for this is included if the
readline headers are found in /usr/include/readline.  There are no
shell-specific completions installed; only the default filename completion is
used.

Command prompt
==============

The primary and secondary command prompts can be customized by setting the PS1
and PS2 variables (PS3 and PS4 are not supported).  The prompt can be at most
127 characters, and only a limited number of meta-sequences are supported:

Meta-sequence     Meaning

        \u        Current username
        \h        Current hostname, excluding domain name
        \w        Absolute path to current working directory
        \W        Absolute path to current working directory
        \s        Name of the shell
        \v        Version of the shell
        \$        '#' if effective user ID is 0; otherwise '$'
        \[        Begin a sequence of non-printing characters
        \]        End a sequence of non-printing characters
        \000      Literal character given in octal

Startup files
=============

The shell will automatically execute commands from $HOME/.myshrc when it starts
up.  There is no difference between "login" shells and "nonlogin" shells.

Signals
=======

SIGINT (send with Ctrl-C) is passed on to child processes running in the
foreground, if any; otherwise it will cause the current line of input to be
discarded.

SIGSTOP (Ctrl-Z) is not implemented.

Exit status
===========

The exit status of the shell is the exit status of the last command executed, 1
if the system ran out of memory, -1 if there was a parse error, 2 if the shell
could not understand its command line options, or 0 if the input to the shell
was empty or consisted only of whitespace and comments.

Implementation details
======================

External commands are executed using fork() and execpe().  There is no caching
of command locations, so execpe() will linearly search the entire PATH every
time a command is executed.

Pipes are created using pipe() and inherited by forked children.

The code to parse the shell input is all manually written (as opposed to using
flex and bison).  This probably would make it more difficult to implement
additional syntax such as control statements.

Shell variables are kept in a prefix tree (trie) rather than a hash table.

Input to the shell is read using read() because lines may be of unlimited
length, and a shell statement need not correspond to one line of input--- so
there's little reason to insist on always reading one line at a time.  (Note:
this doesn't really make any difference if the shell is used interactively.
Also, interactive mode will use readline() if support is compiled in.)

New builtin commands can be added fairly easily by implementing the builtin
function and adding an entry to the 'builtins' table in mysh_builtin.c.

Lists are used in lots of places in the code.  I borrowed the header from the
Linux kernel for this, since I think it's a good way to do lists in C.

About

A simple shell

License:Other


Languages

Language:C 99.4%Language:Makefile 0.6%