GSH is a pluggable version of DSH (Distributed Shell) written in Python. Both a module and a command-line tool for running a shell command over multiple machines are included. GSH can be extended by adding new host loaders as well as hooking into various stages of the runtime.
While I plan to keep the api stable for the command line, the module's interface and plugin interfaces are still potentially evolving and I can't guarantee all changes will be backwards compatible at this time.
New versions will be updated to PyPI pretty regularly so it should be as easy as:
pip install gsh
# Run uptime across all nginx machines
gsh -g nginx uptime
# Restart nginx serially
gsh -wg nginx sudo /etc/init.d/nginx restart
# Run w across three manual machines.
gsh -m host1,host2 -m host3 w
Many of the default configuration options can be overridden with a configuration file. Configuration is currently specified as a YAML document. The following is what the default configuration would look like:
forklimit: 64
print_machines: true
concurrent: true
timeout: null
plugin_dirs: []
hooks: []
Configuration files are read from the following locations, being overridden in the order they are read:
- /etc/gsh/gsh.yaml
- ~/.gsh/gsh.yaml
One of the default host loaders (-g) allows you to specify group files by name. Group files contain a newline delimited list of hosts with an optional user@ specified. This loader will search for these files in the following locations
- ~/.gsh/group
- ~/.dsh/group
- /etc/gsh/group
- /etc/dsh/group
The first group file found will win. The loader will not continue searching through all paths. An example group that might be accessed when a user types the command gsh -g test_hosts uptime would be:
/etc/gsh/group/test_hosts
test_host1.example.com
test_host2.example.com
# Users can be specified. Also comments are ignored.
roleaccount@test_host2.example.com
This same file format is usable for the file (-f) loader, though you must specify the full path. This loader is more suited to operating over ad hoc built lists of machines.
Plugins are the main reason I set out to create GSH. At my last job we created DSH replacement that specifically worked with our machine database. Now I'm faced with that same decision for querying our machine database directly. In addition to just host loaders, I also want to be able to log the commands. This is where hooks come in.
Loaders are plugins that allow you to build host lists from arbitrary arguments. They should be stored in a specified plugin_dirs location or the default location of /etc/gsh/plugins/loaders. The base class, BaseHostLoader in gsh.plugin, as well as the builtin plugins located in gsh/plugins/loaders in the package itself, are the best way to go about learning how to add new loaders.
As an example, I currently use a machine database that provides a nice set arithmetic syntax. The following example shows how easy it is to add this to GSH:
from mdbset import mdbset
class MdbSetLoader(BaseHostLoader):
opt_short = "-q"
def __call__(self, *args):
return mdbset(*args)
Now we can simply do the following to get a list of all nginx boxes that aren't serving mobile using:
gsh -q "nginx -nginx-mobile"
While the code for the loader does wave over error handling, it really is that simple to extend GSH to provide additional host loading mechanics.
The initial concept for hooks came when I wanted to log the various aspects of DSH to my real-time event logging framework. This was difficult with DSH as I had to parse various aspects of out the output and not all data was easily exposed.
As with loaders, they should be stored in a specified plugin_dirs location or the default location of /etc/gsh/plugins/hooks. The base class, BaseExecutionHook in gsh.plugin, as well as the builtin plugins located in gsh/plugins/hooks in the package itself, are the best way to go about learning how to add new hooks.
Over the last several years DSH has been invaluable to my career as a System's Administrator, however it has a few shortcomings which have come up over time. I've always wanted to extend DSH in various ways specific to my environment and usually ended up with various wrapper scripts to load hosts from inventory databases.
I specifically chose Python for this project as it is the language I use most often and this allows me to use it as a module without shelling out. While benchmarks have shown GSH to be slower, the overhead seems to be near constant. Considering the extensibililty, and that most of the time spend is waiting on network I/O, I consider this a fair tradeoff.
Besides the loaders and hooks mentioned in other sections, GSH provides a few general benefits over DSH.
Specifying a loader option (-m/-g/etc) without a command will list the hosts.
Many times I'd like to just see which hosts are in a group. As I added new host loaders for dynamically building host lists this became a more common desire. With DSH if you failed to specify a command it would just SSH to your machines and exit. GSH will print the list of machines that would be be used in the absense of a command.
The fork limit (-F) can be specified as a percentage.
Often times you'll end up scripting rolling restarts of services over various groups of hosts. You might end up specifying a fork limit of 24 forgetting one of your smaller web pools is smaller than that causing them to all be restarted at once, causing an outage. Your options previously were lowering the limit to the lowest common denominator or writing annoying wrapper scripts. GSH allows you to specify the fork limit as a percentage. e.g. gsh -g mobileweb -F 20% "/etc/init.d/nginx restart"
ps output is cleaner / less forking madness.
While this may seem like a silly thing to list as an improvement, it has sufficiently improved my quality of life. In DSH you saw 3 forks for each host that you were connected to. This made it annoying to find which process you should be killing when it hung. It's also utilizing more processes for no noticable benefit. An example of DSH output is as follows:
bash
\_ dsh -Mcg test_hosts uptime
\_ dsh -Mcg test_host1 uptime
| \_ dsh -Mcg test_host1 uptime
| \_ rsh test_host1 uptime
\_ dsh -Mcg test_host2 uptime
\_ dsh -Mcg test_host2 uptime
\_ rsh test_host2 uptime
This is GSH output for comparison:
bash
\_ python gsh -g test_hosts uptime
\_ ssh test_host1 uptime
\_ ssh test_host2 uptime
Timeouts!
Speaking of having to know which processes to kill when they've hung... GSH provides a timeout option. While setting your SSH connection timeout is nice, sometimes the command just hangs indefinitely and you want to get your shell back. The -t option of gsh will timeout long running processes after the specified allotment of time. Be careful with this though as it will kill you command ungracefully (-9).
Concurrency improvements.
DSH had this weird issue where it couldn't do a fork limit of 1. It would always have one more host than you has specified. This is a behavior I did not replicate. A fork limit of 1 is now synonymous with serial execution mode. In addition to this change I have instituted a default fork limit of 64. DSH would gladly kill your machine by default by forking way too many processes. This limit can be overridden on the command line or in your personal or system configuration files, however no option exists to eliminate this limit.
What the hell does GSH stand for?
GSH stand's for Gary's Shell. Mostly just because I didn't know what to call it and it's close to DSH. Plus who doesn't want a piece of software named after themselves.
How do you pronounce GSH?
You can either pronounce each character individually or some of us at work have taken to calling it "geesh." The G in geesh is pronounced like the G in gum and not like a J.