gy741 / shell-complete

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shell Complete - WIP Build Status codecov PEP8

Sequential Keras models for both command line misprints correction and next command prediction. The RNNs are trained on datasets of bash/zsh/fish history files gathered from GitHub.

Installation

pip3 install git+https://github.com/src-d/shell-complete

Run the data pipeline

  • Get the list of repositories

To use the GitHub API, you need to generate a personel access token, see GitHub help. Then, run:

shcomplete repos -t token -o output.txt
  • Get the history files using Scrapy
scrapy runspider repospider.py
  • Clean the dataset
shcomplete filtering -d shcomplete/data
  • Build a vocabulary of command line prefixes based on TF-DF score

Store command line prefixes into a trie data structure, using google/pygtrie. Compute the Term-Frequency Document-Frequency score of each prefix and prune the trie based on these numerical statistics to keep only the relevant prefixes. The level of noise in this vocabulary depends on the threshold parameter.

shcomplete tfdf -d shcomplete/data -o vocabulary.txt
  • Build the corpus, input when generating batches of data
shcomplete corpus -d shcomplete/srcd -o output.txt

Train the sequential Keras models

See the following command line interface to train the RNNs for both misprints correctionon and next command prediction, on the previous dataset of command line histories.

shcomplete model2correct --help
shcomplete model2predict --help

As regards misprints correction, a sequential model that reached 99% accuracy on more than 1000 basic command line prefixes after 100 epochs with 4 GPUs is provided in /saved_models. If you want it to take into account your aliases or specific commands, we recommand you to train this model on your own history.

Usage - WIP

License

Apache 2.0.

About

License:Apache License 2.0


Languages

Language:Python 100.0%