google / seqio

Task-based datasets, preprocessing, and evaluation for sequence models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Concatenating Tasks?

gahdritz opened this issue · comments

Is there a way to concatenate multiple Tasks? Mixtures sample from component Tasks until one of them runs out of examples. Is there a variant that uses all of the examples from both Tasks in each epoch?

working on something similar, please let me know if you find something

Do you just want to use all of the examples or do you want one task to come first followed by another? To do the first, set the following argument on the Mixture constructor:

sample_fn = functools.partial(tf.data.Dataset.sample_from_datasets, stop_on_empty_dataset=False)

To do the second, you could probably do the following:

concat_fn = lambda a, b: a.concatenate(b)
...
sample_fn = lambda datasets, _, _: functools.reduce(concat_fn, datasets)