limapedro / slms

Experimenting with small anguage models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Small Language Models

This repo contains the code to train a simple language model on the TinyStories dataset.

The purpose of the repo is for me to experiment with creating small language models (SLMs) for very specific tasks; in this case writing short stories. The idea comes from a combination of the LIMA and TinyStories research papers. They describe how to improve performance of language models based on small, but high quality, curated datasets.

Hosting

This model is not officially hosted. You can however, find download and use the model through HuggingFace AutoModels. Here is the link.

Training

All code needed to train and run the model is provided. Including code to train a custom tokenizer.

Using the hyperameters specified in hyperparameters.json I was able to train a 1.456M parameter model that achieves a loss of 0.493 on the validation dataset. If you are up to it please feel free to attempt to beat this number (I do not suspect it will be very hard).

Training was done a very shitty Nvdia 1650 Ti GPU with 4GB of vRAM. Training took about 40 minutes to complete.

Sample

Here is a short sample generated by the trained model:

Once upon a time, in a big tree, there lived a little bird. Fred was very rough, and he always enjoyed looking for new nest. One day, Brown saw a little bird flying by. He thought it would be a friend and it was a h passion, too far. It was so long that it was a big with long beak. The little bird thought of sound could see it square apples, so it started to feel smoke on a branch. He hopped and flew, feeling so safe that the wind was getting ready to go home. Suddenly, she felt embarrassed. Just then, a big bird realized there was s happening next to its nest. She decided to care of the nest and not take it away from the bird down and seemed to hurt him. Some nest was also nest, growing tall by the nest and the p ripped its wings around them. The little bird was defied. They had to catch the butterfly. The bird learned a r own miner and the lesson: it's important to ask for help ring. Browned at ue, t we are because you can make it feel better." The little bird on the ground shook its head and tried to catch the leak . Now, I have an eow!

No it's not completely coherent, but we are getting close!

About

Experimenting with small anguage models


Languages

Language:Python 100.0%