karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

WikiText 103 evaluation

karpathy opened this issue · comments

I've seen some repos use WikiText-103 as the dataset they use to eval GPT-like models, e.g.:

https://github.com/tysam-code/hlb-gpt/tree/main

Add prepro script to download and preprocess and tokenize WikiText-103 just like tiny shakespeare / tiny stories, following this repo. Adapt the mainline training script train_gpt2.cu to report the validation performance on this set.

Add python code that does the same, evaluates on WikiText-103, and reports performance for all the GPT-2 model sizes. This is our baseline to reach, training from scratch init.

Optionally help research other ways that people have evaluated GPT-2 models, or attempted to reproduce them in the past.

commented

We are abandoning WikiText103 because it's a total mess. We'll instead look at one/few of ARC Easy / Challenge, Squad, Hellaswag, TriviaQA, LAMBADA. Closing.