What can be done with a Vaswani Transformer other than translation?
This repo will begin with the code from the The Annotated Transformer
So when I started this, I was having a problem pushing this change to github. This got me chasing down this Connecting to GitHub with SSH path, which I did on KAUWITB. A quick test proved it was working. Nice!
I want to provide time stamps in this log to indicate the duration between steps. Let's start with creating the code for the model, shall we ...
I am thinking, how do I use this code in Transformers.ipynb to implement JUST an encoder and then JUST a decoder? The Andrej Karpathy minGPT code is just a decoder. I think the same can be said for his other example, nanoGPT.Look at these examples for guidance on how to do this.
Continuing with building the workflow.
Studying the code found in nanoGPT which is really excellent.
The model.py code found in nanoGPT bears a striking similarity to the code found in The Annotated Transformer.
Studying the Shakespeare example in the nanoGPT code base because I want to try to implement something like that here.
If you peek inside the config/train_shakespeare_char.py file of the nanoGPT code base, you'll see that we're training a GPT with a context size of up to 256 characters, 384 feature channels (the width of the embeddings), and it is a 6-layer Transformer with 6 heads in each layer.
Yup, getting back to this cuz I want to resume my deep study of transformers. And I will once again begin with the code from The Annotated Transformer. I am curious to see just how much of this code can I get to run on KAUWITB.
The docker container hfpt_Sept1 is not going to work for this repo. Gonna next try a pure pytorch docker container.
Now trying the docker container sad_nightingale, which was spun up from the image pt1131:20230216.