oskarfernlund/noskGPT

noskGPT is a nano-scale (1,170,625 parameter) generative pretrained transformer which generates Shakespearian dialogue. It has been pre-trained on an abridged corpus of William Shakespeare's plays (see the shakespeare.txt file in the data/ directory) and uses simple character-level tokenisation. The scaled dot-product attention mechanisms and transformer architecture have been written from scratch using pytorch, and are inspired by the original transformer architecture outlined in Attention Is All You Need.

Architecture 🏠

The intention of noskGPT is to be small and simple enough to train and run on a typical laptop CPU.

Summary of noskGPT architecture:

Vocabulary size: 65 characters
Block size / context length: 256 characters
Embedding dimensionality: 128
Attention heads per transformer block: 4
Layers / transformer blocks: 5
Dropout probability (training): 0.1

The model weights can be found in the weights.pth file in the root directory.

Installation 💽

To install noskGPT's dependencies, you will need to have python 3.10. I personally like managing my python versions and environments with miniconda and managing project dependencies with poetry, but I have included installation instructions for poetry and conda/pip. With a python 3.10 environment activated, run one of the following sets of commands:

Poetry

poetry env use $which python
poetry install

Conda / Pip

pip install -r requirements.txt

Command Line Interface 💻

noskGPT has a simple command line interface which can be invoked as follows from the root directory:

python noskgpt.py --max-chars=1000

The optional --max-chars flag specifies the number of characters noskGPT will generate in response to a given prompt, and has a default value of 1000. An example prompt / response interaction with noskGPT is shown below:

The generated text definitely isn't perfect -- the overall tone feels very Shakespearian but noskGPT tends to make up words (largely a consequence of the character-level tokenisation) and the overall narrative isn't very clear. However, I think this is pretty impressive for such a small and simple model!

Training ⌛

Training noskGPT took me about 3-4 hours on my M2 MacBook CPU.

Summary of noskGPT training details:

batch size: 64
learning rate: 0.001
training epochs: 10,000

The final validation loss I was able to achieve using the architecure and training details above was around 1.52. If you have more patience than me or access to more compute resources, I would encourage you to try scaling up the model (or using more sophisticated tokenisation) to see if you can do better :) To convert the training notebook from .py to .ipynb, run the following command from the root directory:

jupytext --to notebook training.py

Acknowledgements ❤️

Thank you very much to Andrej Karpathy for your wonderful tutorial on generative language models ✨

oskarfernlund / noskGPT