LLM inference research

Experimentation framework from Graphcore Research, used to explore the machine learning performance of post-training model adaptation for accelerating LLM inference.

See: SparQ Attention.

Setup

See scripts/Eval.ipynb and scripts/Quantisation.ipynb for usage.

python3 -m venv .venv
# Append to .venv/bin/activate:
    export PYTHONPATH="${PYTHONPATH}:$(dirname ${VIRTUAL_ENV})"
    export TOKENIZERS_PARALLELISM=true

source .venv/bin/activate
pip install wheel
# On a CPU-only machine, you may need to run this before `pip install -r requirements.txt`
# pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

# Optional - notebooks
git clone git@github.com:PRODUCT-AI-ENGINEERING-GCAI/research-llm-inference.git --branch notebooks notebooks/

Development

We use a script called dev to automate building, testing, etc.

./dev
./dev --help

License

See NOTICE.md for further details.

About

An experimentation platform for LLM inference optimisation

MIT License

Languages

Language:Jupyter Notebook 65.5%Language:Python 34.5%Language:Shell 0.1%