Improve representation of evaluationResults so we can build automated leaderboards
dgarijo opened this issue · comments
Representation results are text. We should build on model cards and propose something a bit more structured so we can compare model evaluation outputs.