bigscience-workshop / promptsource

Toolkit for creating, sharing and using natural language prompts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possibility to use custom datasets

lauritowal opened this issue · comments

Hi - I have a custom dataset (e.g. https://huggingface.co/datasets/lauritowal/redefine-math) and would like to create prompts for it by using PromptSource:

-Download the repo
-Navigate to the root directory of the repo
-Run pip install -e . to install the promptsource module

It seems there is no option for loading a custom dataset into the tool via the Web-GUI. Maybe, I am missing something, but this would be a very helpful feature... (If there is a way to achieve this already, please let me know!)

Thanks!

Hi @lauritowal,

The easiest way there is right now is to include your user name into the list of additional user names to include in the toggle:

INCLUDED_USERS = {"Zaid", "craffel"}

That way, your datasets will appear in promptsource. You can do that change locally since you are already installing from source.

@VictorSanh Wouldn't that require to have the a template.yml file in the TEMPLATES_FOLDER_PATH ? I don't have a template.yml file for my dataset above...

Okay,

  1. I've added my huggingface username lauritowal to:

INCLUDED_USERS = {"Zaid", "craffel", "lauritowal"}

  1. I manually created a templlate.yml under /promptsource/templates/lauritowal/redefine_math/templates.yaml
dataset: lauritowal/redefine_math
templates:
  02ff2949-0f45-4d97-941e-6fa4c0afbc2d: !Template
    answer_choices: 0 ||| 1
    id: 02ff2949-0f45-4d97-941e-6fa4c0afbc2f
    jinja: Question... {{text}} ||| {{ answer_choices [label] }}
    metadata: !TemplateMetadata
      choices_in_prompt: false
      languages:
      - en
      metrics:
      - Accuracy
      original_task: true
    name: Choices
    reference: ''

However, I still can't see my dataset under "Sourcing" (or anywhere else)

@lauritowal are you developing on a fork? i can help debug from there

@VictorSanh sure, have a look at: Cadenza-Labs@a0ac44c
Thanks a lot!

@VictorSanh did it work for you?

Okay, seems like my dataset is not in datasets: https://huggingface.co/api/datasets?full=true

Indeed, that seems to be the root cause. I have asked internally why your dataset does appear in the result of the query. will get back to your tmrw morning, most of the team is in europe

Fixed!
api/datasets/ is now paginated and your dataset was appearing on the 2nd page.
LMK if it works on your side!

it works now, awesome! Thanks a lot @VictorSanh

amazing! closing this then