google-research / FLAN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Cache is empty with FLAN, but not with seqio

TheExGenesis opened this issue · comments

Caching tasks registered by FLAN is resulting in empty files. I'm running seqio_cache_tasks --output_cache_dir=/root/seqio_cache --module_import=src.register_tasks where src.register_tasks is a file that registers tasks in a way equivalent to importing flan.v2.mixtures.

I tried registering a task straight via seqio, as shown below, and it cached correctly.

seqio.TaskRegistry.add(
    "wmt19_ende",
    seqio.TfdsDataSource(tfds_name="wmt19_translate/de-en:1.0.0"),
    preprocessors=[
        functools.partial(
            translate, source_language='en', target_language='de'),
        seqio.preprocessors.tokenize, seqio.preprocessors.append_eos
    ],
    output_features=task_configs.DEFAULT_OUTPUT_FEATURES,
    metric_fns=[bleu])

@TheExGenesis Honestly, I'm not sure what behaviour you're seeing, especially without seeing the code you added or being a seqio expert. I would recommend starting with a small submixture, like cot_zsopt, and track the task at various stages of pre-processing. It seems odd to me that it could generate a task but not cache it?

I think it was a problem with seqio because I got it to work after updating to their latest commit.