bigscience-workshop / promptsource

Toolkit for creating, sharing and using natural language prompts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

error while loading qed in sourcing

shanyas10 opened this issue · comments

Hi, I'm getting this error while using qed as the dataset in sourcing view:

ArrowInvalid: Could not convert in with type str: tried to convert to boolean
Traceback:
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/streamlit/script_runner.py", line 354, in _run_script
    exec(code, module.__dict__)
File "/Users/s0s0cr3/Documents/GitHub/promptsource/promptsource/app.py", line 260, in <module>
    dataset = get_dataset(dataset_key, str(conf_option.name) if conf_option else None)
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/streamlit/caching.py", line 543, in wrapped_func
    return get_or_create_cached_value()
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/streamlit/caching.py", line 527, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
File "/Users/s0s0cr3/Documents/GitHub/promptsource/promptsource/utils.py", line 49, in get_dataset
    builder_instance.download_and_prepare()
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/datasets/builder.py", line 607, in download_and_prepare
    self._download_and_prepare(
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/datasets/builder.py", line 697, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/datasets/builder.py", line 1106, in _prepare_split
    num_examples, num_bytes = writer.finalize()
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/datasets/arrow_writer.py", line 456, in finalize
    self.write_examples_on_file()
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/datasets/arrow_writer.py", line 325, in write_examples_on_file
    pa_array = pa.array(typed_sequence)
File "pyarrow/array.pxi", line 222, in pyarrow.lib.array
File "pyarrow/array.pxi", line 110, in pyarrow.lib._handle_arrow_array_protocol
File "/Users/s0s0cr3/Library/Python/3.9/lib/python/site-packages/datasets/arrow_writer.py", line 121, in __arrow_array__
    out = pa.array(cast_to_python_objects(self.data, only_1d_for_numpy=True), type=type)
File "pyarrow/array.pxi", line 305, in pyarrow.lib.array
File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status

I've been able to use this dataset fine while prompting but not sure what's breaking now. I also tried deleting the cache and downloading again but it's still the same.

Maybe a temporary thing? It just downloaded for me.

I just tested and realized that it's a new issue since datasets==1.15.0.
We may have to pin to datasets==1.14.0

i am running into the same issue and i have re-opened the issue on datasets (see huggingface/datasets#3346 (comment)).
@shanyas10 let's put that dataset aside for now

fixed huggingface/datasets#3417
you'll need to be on datasets' master @shanyas10