Attribute error during prediction step of hierarchical intent and slot filling example
ButteredGroove opened this issue · comments
Steps to reproduce
- Ubuntu 16.04, Python 3.7, CUDA 9.0 install
- pip3 install pytext-nlp
- git clone https://github.com/NVIDIA/apex
- cd apex
- pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
- cd ..
- wget https://fb.me/semanticparsingdialog
- unzip top-dataset-semantic-parsing.zip
- Grab yourself a copy of https://github.com/facebookresearch/pytext/blob/master/demo/configs/rnng.json
- Edit rnng.json to point "train_filename" to top-dataset-semantic-parsing/train.tsv
- Edit rnng.json to point "test_filename" to top-dataset-semantic-parsing/test.tsv
- Edit rnng.json to point "eval_filename" to top-dataset-semantic-parsing/eval.tsv
- Train a model. It'll take around an hour:
pytext train < rnng.json - Run predict step:
pytext predict-py --model-file=/tmp/model.pt - When prompted for a json example, try:
{"text": "traffic in Los Angeles"}
Observed Results
$ pytext predict-py --model-file=/tmp/model.pt
Loading model from model.pt...
please input a json example, the names should be the same with column_to_read in model training config:
{"text": "traffic in Los Angeles"}
Traceback (most recent call last):
File "/home/user/.local/bin/pytext", line 10, in <module>
sys.exit(main())
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/pytext/main.py", line 407, in predict_py
pprint.pprint(task.predict([json.loads(line)])[0])
File "/home/user/.local/lib/python3.7/site-packages/pytext/task/task.py", line 216, in predict
model_inputs, context = self.data_handler.get_predict_iter(examples)
AttributeError: 'SemanticParsingTask' object has no attribute 'data_handler'
Expected Results
The expectation was to see predictions and scores.
Relevant Code
See above.
I forgot to mention: this is based on the hierarchical intent and slot filling tutorial.
pytext predict-py is currently broken. Thanks for reporting the issue, we're working on fixing it.
Actually, the related issue seems to be closed, are you still seeing the issue based on the latest code? #701
I had been installing from pip, not the repo. So, I did a fresh install of pytext following the instructions here: https://pytext.readthedocs.io/en/master/installation.html#install-from-source
pytest and pytest --cov worked fine.
However, when I run pytext I'm getting the following error:
$ pytext train < rnng.json
8<--- snip --->8
Traceback (most recent call last):
File "/home/user/.local/bin/pytext", line 11, in <module>
load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/user/.local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/gpfs-volume/pytext/pytext/pytext/main.py", line 350, in train
train_model(config, metric_channels=metric_channels)
File "/gpfs-volume/pytext/pytext/pytext/workflow.py", line 89, in train_model
config, dist_init_url, device_id, rank, world_size, metric_channels, metadata
File "/gpfs-volume/pytext/pytext/pytext/workflow.py", line 125, in prepare_task
config.task, metadata=metadata, rank=rank, world_size=world_size
File "/gpfs-volume/pytext/pytext/pytext/task/task.py", line 43, in create_task
world_size=world_size,
File "/gpfs-volume/pytext/pytext/pytext/config/component.py", line 154, in create_component
return cls.from_config(config, *args, **kwargs)
File "/gpfs-volume/pytext/pytext/pytext/task/new_task.py", line 100, in from_config
tensorizers, data = cls._init_tensorizers(config, tensorizers, rank, world_size)
File "/gpfs-volume/pytext/pytext/pytext/task/new_task.py", line 142, in _init_tensorizers
init_tensorizers=init_tensorizers,
File "/gpfs-volume/pytext/pytext/pytext/config/component.py", line 154, in create_component
return cls.from_config(config, *args, **kwargs)
File "/gpfs-volume/pytext/pytext/pytext/data/data.py", line 243, in from_config
**kwargs,
File "/gpfs-volume/pytext/pytext/pytext/data/data.py", line 270, in __init__
initialize_tensorizers(self.tensorizers, full_train_data)
File "/gpfs-volume/pytext/pytext/pytext/data/tensorizers.py", line 1306, in initialize_tensorizers
for row in data_source:
File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 243, in _convert_raw_source
example = self._read_example(row)
File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 217, in _read_example
example[name] = self.load(value, self.schema[name])
File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 264, in load
return converter(value)
File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 328, in load_json
return json.loads(s)
File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Exception ignored in: <generator object AnnotationNumberizer.initialize at 0x7fbd6004b9a8>
Traceback (most recent call last):
File "/gpfs-volume/pytext/pytext/pytext/data/tensorizers.py", line 1211, in initialize
self.shift_idx = self.vocab.idx[SHIFT]
KeyError: 'SHIFT'
I observed this issue with pytext v0.2.2
today.
Wondering if there's more news on this front? Thanks!
I split off the problem of the demo failing to train with v0.2.2 into it's own issue. Once that's fixed maybe we can revisit the issue here about predictions not working.
There is a fix concerning the training failing and a pull request to add it to master:
https://github.com/facebookresearch/pytext/pull/1151
But the predict is still failing. Any known solution?
Same problem with predict (AttributeError: 'SemanticParsingTask' object has no attribute 'data_handler')
- Any hint on what the issue is?
Observed this issue with pytext v0.3.0
as well.
Still has error with current master branch (0.3.1) of pytext.
Install
$ git clone git@github.com:facebookresearch/pytext.git
$ cd pytext
$ python3 -m venv pytext_venv
$ source pytext_venv/bin/activate
$ pip install --upgrade pip
$ pip install torch
$ ./install_deps
$ export LANG=en_US.utf8
$ export LC_ALL=en_US.utf8
Config file (top.json)
{
"task": {
"SemanticParsingTask": {
"data": {
"batcher": {
"PoolingBatcher": {
"eval_batch_size": 1,
"test_batch_size": 1,
"train_batch_size": 1
}
},
"source": {
"TSVDataSource": {
"field_names": ["text", "tokenized_text", "seqlogical"],
"train_filename": "/home/user/top/train.tsv",
"test_filename": "/home/user/top/test.tsv",
"eval_filename": "/home/user/top/eval.tsv"
}
}
},
"model": {
"lstm": {
"dropout": 0.34,
"lstm_dim": 16,
"num_layers": 2,
"bidirectional": true
},
"ablation": {
"use_buffer": true,
"use_stack": true,
"use_action": true,
"use_last_open_NT_feature": false
},
"constraints": {
"intent_slot_nesting": true,
"ignore_loss_for_unsupported": false,
"no_slots_inside_unsupported": true
},
"max_open_NT": 10,
"dropout": 0.34,
"compositional_type": "sum"
},
"metric_reporter": {
"text_column_name": "tokenized_text"
},
"trainer": {
"real_trainer": {
"report_train_metrics": false,
"epochs": 1
}
}
}
},
"version": 12
}
Training
$ pytext train < top.json
Predict output using tutorial (key is "text")
$ pytext predict-py --model-file=/tmp/model.pt
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Install apex from https://github.com/NVIDIA/apex/.
Loading model from /tmp/model.pt
Loaded checkpoint...
Use config saved in snapshot
Creating task: SemanticParsingTask...
Skipped initializing tensorizers since they are loaded from a previously saved state.
Loading model from model state dict...
Loaded!
please input a json example, the names should be the same with column_to_read in model training config:
{"text": "order coffee from starbucks"}
Traceback (most recent call last):
File "/home/user/pytext/pytext/pytext_venv/bin/pytext", line 11, in <module>
load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/user/pytext/pytext/pytext/main.py", line 447, in predict_py
pprint.pprint(task.predict([json.loads(line)])[0])
File "/home/user/pytext/pytext/pytext/task/new_task.py", line 233, in predict
_, inputs = next(pad_and_tensorize_batches(self.data.tensorizers, batches))
File "/home/user/pytext/pytext/pytext/data/data.py", line 157, in pad_and_tensorize_batches
for raw_batch, numberized_batch in batches:
File "/home/user/pytext/pytext/pytext/data/data.py", line 140, in batchify
for super_pool in self._group_iter(iterable, super_pool_size, None):
File "/home/user/pytext/pytext/pytext/data/data.py", line 78, in _group_iter
for group in itertools.zip_longest(*iterators):
File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in numberize_rows
for name, tensorizer in self.tensorizers.items()
File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in <dictcomp>
for name, tensorizer in self.tensorizers.items()
File "/home/user/pytext/pytext/pytext/data/tensorizers.py", line 454, in numberize
tokens, start_idx, end_idx = self._lookup_tokens(row[self.text_column])
KeyError: 'tokenized_text'
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
$
Predict output using column_to_read (key is "tokenized_text")
$ pytext predict-py --model-file=/tmp/model.pt
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Install apex from https://github.com/NVIDIA/apex/.
Loading model from /tmp/model.pt
Loaded checkpoint...
Use config saved in snapshot
Creating task: SemanticParsingTask...
Skipped initializing tensorizers since they are loaded from a previously saved state.
Loading model from model state dict...
Loaded!
please input a json example, the names should be the same with column_to_read in model training config:
{"tokenized_text": "order coffee"}
Traceback (most recent call last):
File "/home/user/pytext/pytext/pytext_venv/bin/pytext", line 11, in <module>
load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/user/pytext/pytext/pytext/main.py", line 447, in predict_py
pprint.pprint(task.predict([json.loads(line)])[0])
File "/home/user/pytext/pytext/pytext/task/new_task.py", line 233, in predict
_, inputs = next(pad_and_tensorize_batches(self.data.tensorizers, batches))
File "/home/user/pytext/pytext/pytext/data/data.py", line 157, in pad_and_tensorize_batches
for raw_batch, numberized_batch in batches:
File "/home/user/pytext/pytext/pytext/data/data.py", line 140, in batchify
for super_pool in self._group_iter(iterable, super_pool_size, None):
File "/home/user/pytext/pytext/pytext/data/data.py", line 78, in _group_iter
for group in itertools.zip_longest(*iterators):
File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in numberize_rows
for name, tensorizer in self.tensorizers.items()
File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in <dictcomp>
for name, tensorizer in self.tensorizers.items()
File "/home/user/pytext/pytext/pytext/data/tensorizers.py", line 1578, in numberize
annotation = Annotation(row[self.column])
KeyError: 'seqlogical'
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0