I initiliased the dataset with Nf
and Npts
variables, denoting the number of functions to generate and the number of points to generate per function. During initialisation of the Dataset
, the function generate_functions is called which uses the code provided to generate Nf
functions and saves Npts
points per function to the member variables x_values
and y_values
.
Then using a data split of 80% training_data:test_data ratio, I split the data set in two parts. The DataLoaders train_iter
and test_iter
are instantiated using their respective datasets, the batch_size
and shuffle set to True
to reduce non-representative batches.
I created a Net class that is used for both the encoder and decoder. The Net class has 2 linear layers and uses a ReLU activation function between the linear layers.
The inputs to the encoder and decoder are both 2 and the hidden layers are both of dimention
I am using a single Adam optimiser with learning rate set to 0.001 and weight decay set to 0.0005 for both the encoder and decoder models.
I am using MSELoss as it is well suited to regression problems.
In the training loop, I firstly set the models to train mode. I use the Accumulator class from my_utils.py
to collect the loss metrics. For every batch, I sample the context points and pass them into the encoder and get
I was able to achieve a training loss of around 0.073. With a validation loss of 0.077.
I experimented with several learning rates between 0.01 and 0.001 and 0.005 seemed to produce the lowest loss.
It seems that lower the bestter with
Out of 5, 8, 16 and 100, batch size 16 provided the most consistently low loss.
I tried both SGD and Adam but the loss seemed to get stuck at 0.27 with SGD. Adam allowed for the loss to be reduced further.
Number of Hidden Layers
When I added an additional layer to the encoder and decoder the loss increased by ~0.02. So it seems that 2 layers is sufficient for the problem and adding more may be detrimental to the performance of the model both in terms of computation and accuracy.
Using the test dataset I created in task 1, calculated the loss and plotted several functions to see how my model is performing in real terms.
The results from applying my model to the test sequences test_data.pkl
: