data-apis / python-record-api

Inferring Python API signatures from tracing usage.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BUG: record_api.line_counts raises TypeError on unexpected keyword arguments to dump()

datapythonista opened this issue · comments

I'm getting the next error when calling record_api.line_counts. I'm using version 1.1.1.

/tmp/record_api_results.jsonl contains the output generated by python -m record_api, here there is a sample:

$ head -n 5 record_api_results.jsonl
{"location":"/api_stats/scripts/10002306.py:22","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":"module","v":"pandas"},"read_csv"]}}
{"location":"/api_stats/scripts/10002306.py:22","function":{"t":"function","v":{"module":"pandas.io.parsers","name":"_make_parser_function.<locals>.parser_f"}},"bound_params":{"pos_or_kw":[["filepath_or_buffer","../input/train.csv"],["index_col",null]]}}
{"location":"/api_stats/scripts/10002306.py:23","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":{"module":"pandas.core.frame","name":"DataFrame"}},"shape"]}}
{"location":"/api_stats/scripts/10002306.py:25","function":{"t":"method","v":{"self":{"t":{"module":"pandas.core.frame","name":"DataFrame"}},"name":"head"}},"bound_params":{}}
{"location":"/api_stats/scripts/10002306.py:27","function":{"t":"builtin_function_or_method","v":"getattr"},"params":{"args":[{"t":"module","v":"pandas"},"read_csv"]}}

Call to line_counts:

export PYTHON_RECORD_API_INPUT=/tmp/record_api_results.jsonl
export PYTHON_RECORD_API_OUTPUT=/tmp/record_api_results_line_counts.jsonl
python -m record_api.line_counts

Result:

Counting lines...
reading /tmp/record_api_results.jsonl: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 478985/478985 [00:02<00:00, 217632.24it/s]
writing:   0%|                                                                                                                                                                                                      | 0/13169 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/line_counts.py", line 42, in <module>
    __main__()
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/line_counts.py", line 38, in __main__
    write(row_)
  File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/jsonl.py", line 45, in write_line
    buffer.write(orjson.dumps(o, **kwargs))
TypeError: dumps() got an unexpected keyword argument

Do you have the same orjson version? I could pin the later one if that's the issue:

$ pip freeze | grep orjson
orjson==3.0.1

I had orjson==3.2, but I downgraded to 3.0.1 and still the same problem.

There is something very weird I don't understand:

Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import orjson
>>> orjson.dumps({})
b'{}'
>>> orjson.dumps({}, *[])
b'{}'
>>> orjson.dumps({}, *[], **{})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: dumps() got an unexpected keyword argument

For now I'm going to remove the kwargs from the call, since it seems that in this case it's not used anyway.

That looks really weird... We do use it one place to change the default kwarg passed in to change how some things I serialized I believe...

That seems like some sort of Python bug? Or an issue with your python install? I don't understand how f() and f(**{}) would be different.

I upgraded to the latest Python version, and it's working. It's weird that Python has this bug, but who knows... Closing, it works now.