bigscience-workshop / promptsource

Toolkit for creating, sharing and using natural language prompts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Instantiating `DatasetTemplates` - Raising an error when dataset name is not found

VictorSanh opened this issue · comments

>>> DatasetTemplates('ag_news').__dict__
{'dataset_name': 'ag_news', 'subset_name': None, 'templates': {'24e44a81-a18a-42dd-a71c-5b31b2d2cb39': <promptsource.templates.Template object at 0x7fdc493dba30>, '8fdc1056-1029-41a1-9c67-354fc2b8ceaf': <promptsource.templates.Template object at 0x7fdc493dba00>, '918267e0-af68-4117-892d-2dbe66a58ce9': <promptsource.templates.Template object at 0x7fdc493e99a0>, '9345df33-4f23-4944-a33c-eef94e626862': <promptsource.templates.Template object at 0x7fdc493e99d0>, '98534347-fff7-4c39-a795-4e69a44791f7': <promptsource.templates.Template object at 0x7fdc493e9a00>, 'b401b0ee-6ffe-4a91-8e15-77ee073cd858': <promptsource.templates.Template object at 0x7fdc493e9940>, 'cb355f33-7e8c-4455-a72b-48d315bd4f60': <promptsource.templates.Template object at 0x7fdc493e9970>}, 'name_to_id_mapping': {'classify_question_first': '24e44a81-a18a-42dd-a71c-5b31b2d2cb39', 'classify_with_choices_question_first': '8fdc1056-1029-41a1-9c67-354fc2b8ceaf', 'recommend': '918267e0-af68-4117-892d-2dbe66a58ce9', 'which_section_choices': '9345df33-4f23-4944-a33c-eef94e626862', 'which_section': '98534347-fff7-4c39-a795-4e69a44791f7', 'classify_with_choices': 'b401b0ee-6ffe-4a91-8e15-77ee073cd858', 'classify': 'cb355f33-7e8c-4455-a72b-48d315bd4f60'}}
>>> DatasetTemplates('superglue').__dict__
{'dataset_name': 'superglue', 'subset_name': None, 'templates': {}, 'name_to_id_mapping': {}}
>>> DatasetTemplates('super_glue').__dict__
{'dataset_name': 'super_glue', 'subset_name': None, 'templates': {}, 'name_to_id_mapping': {}}
>>> DatasetTemplates('super_glue/rte').__dict__
{'dataset_name': 'super_glue/rte', 'subset_name': None, 'templates': {'2b52a83c-0021-41fe-b44c-5aaa076d71a2': <promptsource.templates.Template object at 0x7fdc493ed7c0>, '2d0d63da-ffcf-4f6e-941a-b8da922be43e': <promptsource.templates.Template object at 0x7fdc493ed790>, '4163e6f1-1a83-4c73-b867-02eb7ac80316': <promptsource.templates.Template object at 0x7fdc493f0730>, '8fb1c6aa-20e9-438c-bece-c6af1c746449': <promptsource.templates.Template object at 0x7fdc493f0760>, '9e078fb4-505b-413c-bb5e-3cd16ddcf5d7': <promptsource.templates.Template object at 0x7fdc493f0790>, 'b8dc85c6-28b6-4340-979a-8e77c2a0dde8': <promptsource.templates.Template object at 0x7fdc493f06d0>, 'e2fb58f2-b1f2-4aef-b74b-c4ee1c571fff': <promptsource.templates.Template object at 0x7fdc493f0700>, 'ed1f4b75-8826-4852-9bd6-aedf368678f5': <promptsource.templates.Template object at 0x7fdc493f04f0>, 'ee0ce095-122a-4509-bf0b-33d1495295f7': <promptsource.templates.Template object at 0x7fdc493f0670>, 'fb4f8144-37f5-4977-88da-37a5d0bfd0e8': <promptsource.templates.Template object at 0x7fdc493f0490>}, 'name_to_id_mapping': {'MNLI crowdsource': '2b52a83c-0021-41fe-b44c-5aaa076d71a2', 'guaranteed true': '2d0d63da-ffcf-4f6e-941a-b8da922be43e', 'can we infer': '4163e6f1-1a83-4c73-b867-02eb7ac80316', 'GPT-3 style': '8fb1c6aa-20e9-438c-bece-c6af1c746449', 'does this imply': '9e078fb4-505b-413c-bb5e-3cd16ddcf5d7', 'should assume': 'b8dc85c6-28b6-4340-979a-8e77c2a0dde8', 'does it follow that': 'e2fb58f2-b1f2-4aef-b74b-c4ee1c571fff', 'based on the previous passage': 'ed1f4b75-8826-4852-9bd6-aedf368678f5', 'justified in saying': 'ee0ce095-122a-4509-bf0b-33d1495295f7', 'must be true': 'fb4f8144-37f5-4977-88da-37a5d0bfd0e8'}}

We should raise an error (raise NameError for instance) when we can't find the dataset

Hi @VictorSanh, would checking name_to_id_mapping after the sync_mapping call in init be enough to determine whether to raise the NameError?

Haven't looked closely yet tbh but feel free to open a PR to address that if you feel like it
cc @arnaudstiegler since you created that class + @stephenbach