Possibility to use custom datasets
lauritowal opened this issue · comments
Hi - I have a custom dataset (e.g. https://huggingface.co/datasets/lauritowal/redefine-math) and would like to create prompts for it by using PromptSource:
-Download the repo
-Navigate to the root directory of the repo
-Run pip install -e . to install the promptsource module
It seems there is no option for loading a custom dataset into the tool via the Web-GUI. Maybe, I am missing something, but this would be a very helpful feature... (If there is a way to achieve this already, please let me know!)
Thanks!
Hi @lauritowal,
The easiest way there is right now is to include your user name into the list of additional user names to include in the toggle:
promptsource/promptsource/templates.py
Line 29 in b860c0b
That way, your datasets will appear in promptsource. You can do that change locally since you are already installing from source.
@VictorSanh Wouldn't that require to have the a template.yml file in the TEMPLATES_FOLDER_PATH ? I don't have a template.yml file for my dataset above...
Okay,
- I've added my huggingface username lauritowal to:
INCLUDED_USERS = {"Zaid", "craffel", "lauritowal"}
- I manually created a templlate.yml under
/promptsource/templates/lauritowal/redefine_math/templates.yaml
dataset: lauritowal/redefine_math
templates:
02ff2949-0f45-4d97-941e-6fa4c0afbc2d: !Template
answer_choices: 0 ||| 1
id: 02ff2949-0f45-4d97-941e-6fa4c0afbc2f
jinja: Question... {{text}} ||| {{ answer_choices [label] }}
metadata: !TemplateMetadata
choices_in_prompt: false
languages:
- en
metrics:
- Accuracy
original_task: true
name: Choices
reference: ''
However, I still can't see my dataset under "Sourcing" (or anywhere else)
@lauritowal are you developing on a fork? i can help debug from there
@VictorSanh sure, have a look at: Cadenza-Labs@a0ac44c
Thanks a lot!
@VictorSanh did it work for you?
Okay, seems like my dataset is not in datasets: https://huggingface.co/api/datasets?full=true
Indeed, that seems to be the root cause. I have asked internally why your dataset does appear in the result of the query. will get back to your tmrw morning, most of the team is in europe
Fixed!
api/datasets/
is now paginated and your dataset was appearing on the 2nd page.
LMK if it works on your side!
it works now, awesome! Thanks a lot @VictorSanh
amazing! closing this then