[BUG] CPU Unittest (macos-python3.7) Failed for ArrowTypeError
barry-jin opened this issue · comments
Description
CPU unittest for macos with python3.7.9 will fail on pyarrow.lib.ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column label with type int64')
.
Probably because numpy has been upgraded to 1.20.0rc1 in the most recent CI tests.
After I fix numpy version to 1.19.4, unittest for macos with python3.7.9 will pass (link).
Error Message
_______________________________ test_glue[copa] ________________________________
task = 'copa'
@pytest.mark.remote_required
@pytest.mark.parametrize('task', ["cb", "copa", "multirc", "rte", "wic", "wsc", "boolq", "record",
'broadcoverage-diagnostic', 'winogender-diagnostic'])
def test_glue(task):
parser = prepare_glue.get_parser()
with tempfile.TemporaryDirectory() as root:
args = parser.parse_args(['--benchmark', 'superglue',
'--tasks', task,
'--data_dir', root])
> prepare_glue.main(args)
tests/data_cli/test_glue.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/gluonnlp/cli/data/general_nlp_benchmark/prepare_glue.py:689: in main
df.to_parquet(os.path.join(base_dir, '{}.parquet'.format(key)))
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pandas/util/_decorators.py:199: in wrapper
return func(*args, **kwargs)
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pandas/core/frame.py:2372: in to_parquet
**kwargs,
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pandas/io/parquet.py:276: in to_parquet
**kwargs,
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pandas/io/parquet.py:101: in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
pyarrow/table.pxi:1394: in pyarrow.lib.Table.from_pandas
???
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pyarrow/pandas_compat.py:588: in dataframe_to_arrays
for c, f in zip(columns_to_convert, convert_fields)]
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pyarrow/pandas_compat.py:588: in <listcomp>
for c, f in zip(columns_to_convert, convert_fields)]
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pyarrow/pandas_compat.py:574: in convert_column
raise e
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/pyarrow/pandas_compat.py:568: in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
pyarrow/array.pxi:292: in pyarrow.lib.array
???
pyarrow/array.pxi:79: in pyarrow.lib._ndarray_to_array
???
pyarrow/array.pxi:67: in pyarrow.lib._ndarray_to_type
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E pyarrow.lib.ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column label with type int64')
pyarrow/error.pxi:107: ArrowTypeError
----------------------------- Captured stdout call -----------------------------
Downloading superglue to "/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmpdm5pl_ev". Selected tasks = copa
Processing copa...
Downloading /Users/runner/.mxnet/datasets/nlp/glue/superglue/copa.zip from https://dl.fbaipublicfiles.com/glue/superglue/data/v2/COPA.zip...
----------------------------- Captured stderr call -----------------------------
0%| | 0.00/44.0k [00:00<?, ?iB/s]
100%|██████████| 44.0k/44.0k [00:00<00:00, 535kiB/s]
To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Fix numpy version to 1.19.4 in workflow.
Efforts are needed to find the root cause.
Need to fix this.
We may add numpy dependency in our setup.py and ping it to be smaller than 1.20.0
.
That can only be a temporary solution. Please also reproduce the bug without gluonnlp and file a bur report upstream so the root cause can be addressed
Similarly, we have also triggered one bug of wikiextractor in which we should report to their repo.
Should we close this? @barry-jin
Yes, Let's close this issue first. I will find the root cause and report to pyarrow. After issues solved, we can revert #1456