zeno-ml / zeno

AI Data Management & Evaluation Platform

Home Page:https://zenoml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ability to provide multiple tags per example

neubig opened this issue · comments

Currently, it is possible to provide a string-valued feature for each example, such as birthplace="usa".

However, I'm not sure if it's possible to add multiple tags for a single example languages=["english", "japanese"]. This functionality would be helpful, as sometimes a document can have many topics, or an image can contain many different objects, and it would be nice to be able to query them.

Is this possible in Zeno now? If not, it would be a good feature to add.

Hmm can you describe what you mean by string-valued feature? Is this in the feature text box on the left? You should be able to just add multiple and they will "and" together I think. If you want to OR them you can do that by creating a slice with the tags

Yeah, these are the features in the text box on the left. Normally the way these are specified is through adding a column to the dataframe that is passed in to Zeno, e.g.:

df = {
  "data": data_list,
  "label": label_list,
  "birthplace": birthplace_list,
}

where the type of birthplace_list is list[str] (one str for each example).
But I would like to have something like languages_list with a type list[list[str]] (multiple strs for each example). Then you could create a slice for all articles where "english" was included in this tag list, and another slice where "japanese" was included in this tag list, but these slices would potentially be overlapping because a particular example could have both "english" and "japanese" active.

Pretty sure you can do this a bit "informally" - Zeno will treat list[list[str]] as a string. You can then create two predicates in the slice builder for the two languages like this:

Screenshot 2023-06-09 at 08 18 55

Ah, nice. That should do for now. But it'd still be nice to be able to do it in the standard UI (including visualizing the frequency of tags and clicking on bars corresponding to each tag).