benchmarking with GPT-2
leejason opened this issue · comments
leejason commented
Any suggestion for benchmarking CTRL with GPT-2? Say, loss value, PPL, or any metric to measure text generation quality?
Julien Chaumond commented
Not a direct answer to your question, but this (timely) article by @chiphuyen is really good
https://thegradient.pub/understanding-evaluation-metrics-for-language-models/
leejason commented
very helpful & thanks