[BUG] Cache is empty with FLAN, but not with seqio
TheExGenesis opened this issue · comments
Caching tasks registered by FLAN is resulting in empty files. I'm running seqio_cache_tasks --output_cache_dir=/root/seqio_cache --module_import=src.register_tasks
where src.register_tasks
is a file that registers tasks in a way equivalent to importing flan.v2.mixtures.
I tried registering a task straight via seqio, as shown below, and it cached correctly.
seqio.TaskRegistry.add(
"wmt19_ende",
seqio.TfdsDataSource(tfds_name="wmt19_translate/de-en:1.0.0"),
preprocessors=[
functools.partial(
translate, source_language='en', target_language='de'),
seqio.preprocessors.tokenize, seqio.preprocessors.append_eos
],
output_features=task_configs.DEFAULT_OUTPUT_FEATURES,
metric_fns=[bleu])
@TheExGenesis Honestly, I'm not sure what behaviour you're seeing, especially without seeing the code you added or being a seqio expert. I would recommend starting with a small submixture, like cot_zsopt
, and track the task at various stages of pre-processing. It seems odd to me that it could generate a task but not cache it?
I think it was a problem with seqio because I got it to work after updating to their latest commit.