kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running on Colab TPU only gives random words and nonsensical outputs

JohnnyRacer opened this issue · comments

Cannot get the model to generate anything like the demo showed. The model would only generative non cohesive gibberish. There were no errors presented at runtime which I thought was very puzzling since the text output was so bad. The notebook was modified slightly to allow for easier installation of the dependencies. Could this be some strange OOM error as Colab usually gives me TPU-v2 which I've heard has insufficient RAM to run this. Any help would be much appreciated.

Link to the notebook with outputs:

https://colab.research.google.com/drive/1GFF56OtZlSoRFaZF35BBwoNgCfzjE-4m#scrollTo=nvlAK6RbCJYg

commented

You commented out the read_ckpt_lowmem line that actually loads the model checkpoint into memory, so it's using a randomly initialized model that will generate gibberish.

Thanks for the quick reply. This was exactly the issue.