Shell Complete - WIP

Sequential Keras models for both command line misprints correction and next command prediction. The RNNs are trained on datasets of bash/zsh/fish history files gathered from GitHub.

Installation

pip3 install git+https://github.com/src-d/shell-complete

Run the data pipeline

Get the list of repositories

To use the GitHub API, you need to generate a personel access token, see GitHub help. Then, run:

shcomplete repos -t token -o output.txt

Get the history files using Scrapy

scrapy runspider repospider.py

Clean the dataset

shcomplete filtering -d shcomplete/data

Build a vocabulary of command line prefixes based on TF-DF score

Store command line prefixes into a trie data structure, using google/pygtrie. Compute the Term-Frequency Document-Frequency score of each prefix and prune the trie based on these numerical statistics to keep only the relevant prefixes. The level of noise in this vocabulary depends on the threshold parameter.

shcomplete tfdf -d shcomplete/data -o vocabulary.txt

Build the corpus, input when generating batches of data

shcomplete corpus -d shcomplete/srcd -o output.txt

Train the sequential Keras models

See the following command line interface to train the RNNs for both misprints correctionon and next command prediction, on the previous dataset of command line histories.

shcomplete model2correct --help
shcomplete model2predict --help

As regards misprints correction, a sequential model that reached 99% accuracy on more than 1000 basic command line prefixes after 100 epochs with 4 GPUs is provided in /saved_models. If you want it to take into account your aliases or specific commands, we recommand you to train this model on your own history.

Usage - WIP

License

Apache 2.0.

gy741 / shell-complete