Task

Train a small neural network to perform language modelling on an example piece of text.

Requirements

Python 3
Pip

Installation

Create a virtual environment (recommended name is venv so that it is git ignored by default)
Activate the environment
Run pip install -r requirements.txt

Refs:

Codebase adapted from minGPT: https://github.com/karpathy/minGPT/

Task Description

Your task is to make the highest-quality language model you can. You are provided with an initial (faulty) attempt at a pipeline intended to train a neural network architecture to do self-supervised character-level language modelling. The provided model is a fully-connected 2-layer neural network with a custom Heaviside activation function, which is set up to be trained using a 0-1 loss, with 10 characters as input and the subsequent one as the target. Here is the recommended way to approach this task:

Understand the relevant bits of the codebase and identify key issues with the current training pipeline (15 minutes)
Come up with a plan on how to improve on the provided architecture and pipeline (15 minutes)
Execute the planned changes (2 hours)
Write up a report of implementation, findings, and further ideas. Make sure to describe the trade-offs of the architectural decisions you made. (30 minutes) ✅

Bonus points (if you have time):

A colleague suggests using word embeddings instead of doing character-level modelling. Describe the trade-offs that change offers. Suggest any alternatives that come to mind. ✅
The chosen accuracy measure doesn’t provide a good understanding of model performance. Suggest an alternative measure for model quality and describe the tradeoff. ✅
The model is currently trained for a set, arbitrary number of iterations. Implement a more informed stopping criterion. ✅

Initial Code with 0-1 Loss

The initial pipeline with fixed code can be found in branch: init_debugging

Changes

Added:

Patience for informated stopping criteria
GPT model
Perplexity as evaluation metric
GPT model config added - get_gpt_config method in main.py

Updated:

Loss Function - Cross Entropy
Feedforward model config" - get_ff_config method in main.py

I would ideally add the list of changes to CHANGELOG.md file, but adding the changes as section to keep things simple for the task.

Training

For training the Feedforward model (default) - python main.py
For training the GPT (gpt-nano) model - python main.py -m gpt

Additional Config Parameters:

Following new hyperparameters are added in the training configuration for GPT model training (get_gpt_config) and given Feedforward model training (get_ff_config):

C.trainer.patience - Number of interations to wait before validation training does not show improvement and training stops
C.trainer.validation_interval - Number of steps after which we run validation
C.trainer.min_relative_improvement - Threshold for relative improvement in validation loss, such that we consider an improvement in validation loss. Here 5% improvement is set in decimals as 0.05

lordzuko / tinyLM