Giters
bigscience-workshop
/
evaluation
Code and Data for Evaluation WG
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
41
Watchers:
23
Issues:
51
Forks:
24
bigscience-workshop/evaluation Issues
translate validation prompts into all training languages
Updated
2 years ago
Comments count
9
Add MasaskhaNER to Full Benchmark
Updated
2 years ago
Comments count
5
Add DiaBLa to Full Benchmark
Updated
2 years ago
Comments count
5
Add HuffPo Text Classification to Full Benchmark
Updated
2 years ago
Comments count
3
Add GEM ToTTo to Full Benchmark
Updated
2 years ago
Comments count
3
Add GEM XSum to Full Benchmark
Closed
2 years ago
Comments count
4
Add WinoMT to Full Benchmark
Updated
2 years ago
Comments count
2
Add LinCE Testbed to Full Benchmark
Updated
2 years ago
Comments count
2
Add GEM WebNLG to Full Benchmark
Updated
2 years ago
Comments count
2
Add CrowS-Pairs to Full Benchmark
Updated
2 years ago
Comments count
5
Add BLiMP to Full Benchmark
Updated
2 years ago
Comments count
5
Add LAMA to Full Benchmark
Updated
2 years ago
Comments count
6
Add GEM MLSum to Full Benchmark
Updated
2 years ago
Comments count
2
Add GEM WikiAuto to Full Benchmark
Updated
2 years ago
Comments count
1
Add BioASQ to Full Benchmark
Updated
2 years ago
Comments count
2
Add QASPER to Full Benchmark
Updated
2 years ago
Comments count
2
Add WikiANN to Full Benchmark
Updated
2 years ago
Comments count
7
Add TyDiQA to Full Benchmark
Updated
2 years ago
Comments count
3
Add MNLI to Full Benchmark
Updated
2 years ago
Comments count
3
Add HANS to Full Benchmark
Updated
2 years ago
Comments count
3
Add SuperGLUE Tasks to Full Benchmark
Updated
2 years ago
Comments count
2
Add ANLI to Full Benchmark
Updated
2 years ago
Comments count
2
Add Jigsaw Toxicity Classification to Full Benchmark
Updated
2 years ago
Comments count
4
Add MKQA to Full Benchmark
Updated
2 years ago
Add QA-SRL to Full Benchmark
Updated
2 years ago
Comments count
2
Add XQuAD to Full Benchmark
Closed
3 years ago
Comments count
1
Add PIAF to Full Benchmark
Closed
3 years ago
Comments count
1
Add GEM Wikilingua to Full Benchmark
Updated
3 years ago
Comments count
1
Add WMT to Full Benchmark
Updated
3 years ago
Comments count
2
Refactor task template to merge multilingual.json and english.json
Updated
3 years ago
Add lambada to validation set
Closed
3 years ago
Create toy tasks/dummy code for prompt-based evals
Closed
3 years ago
Comments count
1
Wrap evaluation benchmark using HF-trainer
Updated
3 years ago
Comments count
2
benchmark mt5 on tydiqa prompting setup
Updated
3 years ago
Comments count
1
Create Targeted Minimal Pair "Stress-Tests" for Sensitivity to Social Groups
Updated
3 years ago
Comments count
2
Create toy tasks/dummy code for fine-tuning evals
Updated
3 years ago
Comments count
1
Start overleaf for benchmark tech report
Updated
3 years ago
Comments count
1
Add PIQA to validation set
Closed
3 years ago
Setup testing
Updated
3 years ago
Comments count
2
Set code conventions
Closed
3 years ago
Comments count
3
Add GEM E2E to Full Benchmark
Updated
3 years ago
Comments count
2
Convert validation code to work with Megatron as well as huggingface
Updated
3 years ago
Add POS Tagging with UD to Full Benchmark
Updated
3 years ago
Comments count
1
Add Flores 101 to to Full Benchmark (as LM tasks, not MT tasks)
Updated
3 years ago
Comments count
1
Add XSum to Full Benchmark
Closed
3 years ago
Comments count
5
Adding TyDi QA to simple_benchmark
Closed
3 years ago
Add CRD3 to Full Benchmark
Updated
3 years ago
Comments count
2
Add TyDiQA for non-training languages to Full Benchmark
Updated
3 years ago
Add Edge Probing Suite to Full Benchmark
Updated
3 years ago
Add GEM Response Generation to Full Benchmark
Updated
3 years ago
Previous
Next