Attribute error during prediction step of hierarchical intent and slot filling example

Question

Attribute error during prediction step of hierarchical intent and slot filling example

ButteredGroove opened this issue 5 years ago · comments

Steps to reproduce

Ubuntu 16.04, Python 3.7, CUDA 9.0 install
pip3 install pytext-nlp
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..
wget https://fb.me/semanticparsingdialog
unzip top-dataset-semantic-parsing.zip
Grab yourself a copy of https://github.com/facebookresearch/pytext/blob/master/demo/configs/rnng.json
Edit rnng.json to point "train_filename" to top-dataset-semantic-parsing/train.tsv
Edit rnng.json to point "test_filename" to top-dataset-semantic-parsing/test.tsv
Edit rnng.json to point "eval_filename" to top-dataset-semantic-parsing/eval.tsv
Train a model. It'll take around an hour:
pytext train < rnng.json
Run predict step:
pytext predict-py --model-file=/tmp/model.pt
When prompted for a json example, try:
{"text": "traffic in Los Angeles"}

Observed Results

$ pytext predict-py --model-file=/tmp/model.pt
Loading model from model.pt...
please input a json example, the names should be the same with column_to_read in model training config:
{"text": "traffic in Los Angeles"}
Traceback (most recent call last):
  File "/home/user/.local/bin/pytext", line 10, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/pytext/main.py", line 407, in predict_py
    pprint.pprint(task.predict([json.loads(line)])[0])
  File "/home/user/.local/lib/python3.7/site-packages/pytext/task/task.py", line 216, in predict
    model_inputs, context = self.data_handler.get_predict_iter(examples)
AttributeError: 'SemanticParsingTask' object has no attribute 'data_handler'

Expected Results

The expectation was to see predictions and scores.

Relevant Code

See above.

ButteredGroove · Answer 1 · Wed Aug 07 2019 09:48:45 GMT+0800 (China Standard Time)

I forgot to mention: this is based on the hierarchical intent and slot filling tutorial.

Barlas Oguz · Answer 2 · Tue Aug 13 2019 03:24:08 GMT+0800 (China Standard Time)

pytext predict-py is currently broken. Thanks for reporting the issue, we're working on fixing it.

Barlas Oguz · Answer 3 · Tue Aug 13 2019 03:26:51 GMT+0800 (China Standard Time)

Actually, the related issue seems to be closed, are you still seeing the issue based on the latest code? #701

ButteredGroove · Answer 4 · Tue Aug 13 2019 05:58:36 GMT+0800 (China Standard Time)

I had been installing from pip, not the repo. So, I did a fresh install of pytext following the instructions here: https://pytext.readthedocs.io/en/master/installation.html#install-from-source

pytest and pytest --cov worked fine.

However, when I run pytext I'm getting the following error:

$ pytext train < rnng.json
8<--- snip --->8
Traceback (most recent call last):
  File "/home/user/.local/bin/pytext", line 11, in <module>
    load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/gpfs-volume/pytext/pytext/pytext/main.py", line 350, in train
    train_model(config, metric_channels=metric_channels)
  File "/gpfs-volume/pytext/pytext/pytext/workflow.py", line 89, in train_model
    config, dist_init_url, device_id, rank, world_size, metric_channels, metadata
  File "/gpfs-volume/pytext/pytext/pytext/workflow.py", line 125, in prepare_task
    config.task, metadata=metadata, rank=rank, world_size=world_size
  File "/gpfs-volume/pytext/pytext/pytext/task/task.py", line 43, in create_task
    world_size=world_size,
  File "/gpfs-volume/pytext/pytext/pytext/config/component.py", line 154, in create_component
    return cls.from_config(config, *args, **kwargs)
  File "/gpfs-volume/pytext/pytext/pytext/task/new_task.py", line 100, in from_config
    tensorizers, data = cls._init_tensorizers(config, tensorizers, rank, world_size)
  File "/gpfs-volume/pytext/pytext/pytext/task/new_task.py", line 142, in _init_tensorizers
    init_tensorizers=init_tensorizers,
  File "/gpfs-volume/pytext/pytext/pytext/config/component.py", line 154, in create_component
    return cls.from_config(config, *args, **kwargs)
  File "/gpfs-volume/pytext/pytext/pytext/data/data.py", line 243, in from_config
    **kwargs,
  File "/gpfs-volume/pytext/pytext/pytext/data/data.py", line 270, in __init__
    initialize_tensorizers(self.tensorizers, full_train_data)
  File "/gpfs-volume/pytext/pytext/pytext/data/tensorizers.py", line 1306, in initialize_tensorizers
    for row in data_source:
  File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 243, in _convert_raw_source
    example = self._read_example(row)
  File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 217, in _read_example
    example[name] = self.load(value, self.schema[name])
  File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 264, in load
    return converter(value)
  File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 328, in load_json
    return json.loads(s)
  File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Exception ignored in: <generator object AnnotationNumberizer.initialize at 0x7fbd6004b9a8>
Traceback (most recent call last):
  File "/gpfs-volume/pytext/pytext/pytext/data/tensorizers.py", line 1211, in initialize
    self.shift_idx = self.vocab.idx[SHIFT]
KeyError: 'SHIFT'

deepali-c · Answer 5 · Fri Aug 16 2019 17:21:44 GMT+0800 (China Standard Time)

I observed this issue with pytext v0.2.2 today.

ButteredGroove · Answer 6 · Sat Aug 31 2019 07:50:54 GMT+0800 (China Standard Time)

Wondering if there's more news on this front? Thanks!

ButteredGroove · Answer 7 · Sat Sep 28 2019 04:15:41 GMT+0800 (China Standard Time)

I split off the problem of the demo failing to train with v0.2.2 into it's own issue. Once that's fixed maybe we can revisit the issue here about predictions not working.

abercher · Answer 8 · Mon Nov 25 2019 21:13:41 GMT+0800 (China Standard Time)

There is a fix concerning the training failing and a pull request to add it to master:
https://github.com/facebookresearch/pytext/pull/1151
But the predict is still failing. Any known solution?

Parker · Answer 9 · Sat Nov 30 2019 16:17:06 GMT+0800 (China Standard Time)

Same problem with predict (AttributeError: 'SemanticParsingTask' object has no attribute 'data_handler') - Any hint on what the issue is?

deepali-c · Answer 10 · Thu Dec 26 2019 14:40:58 GMT+0800 (China Standard Time)

Observed this issue with pytext v0.3.0 as well.

ButteredGroove · Answer 11 · Wed Jan 15 2020 07:24:30 GMT+0800 (China Standard Time)

Still has error with current master branch (0.3.1) of pytext.

Install

$ git clone git@github.com:facebookresearch/pytext.git
$ cd pytext
$ python3 -m venv pytext_venv
$ source pytext_venv/bin/activate
$ pip install --upgrade pip
$ pip install torch
$ ./install_deps
$ export LANG=en_US.utf8
$ export LC_ALL=en_US.utf8

Config file (top.json)

{
  "task": {
    "SemanticParsingTask": {
      "data": {
        "batcher": {
          "PoolingBatcher": {
            "eval_batch_size": 1,
            "test_batch_size": 1,
            "train_batch_size": 1
          }
        },
        "source": {
          "TSVDataSource": {
            "field_names": ["text", "tokenized_text", "seqlogical"],
            "train_filename": "/home/user/top/train.tsv",
            "test_filename": "/home/user/top/test.tsv",
            "eval_filename": "/home/user/top/eval.tsv"
          }
        }
      },
      "model": {
        "lstm": {
          "dropout": 0.34,
          "lstm_dim": 16,
          "num_layers": 2,
          "bidirectional": true
        },
        "ablation": {
          "use_buffer": true,
          "use_stack": true,
          "use_action": true,
          "use_last_open_NT_feature": false
        },
        "constraints": {
          "intent_slot_nesting": true,
          "ignore_loss_for_unsupported": false,
          "no_slots_inside_unsupported": true
        },
        "max_open_NT": 10,
        "dropout": 0.34,
        "compositional_type": "sum"
      },
      "metric_reporter": {
        "text_column_name": "tokenized_text"
      },
      "trainer": {
        "real_trainer": {
          "report_train_metrics": false,
          "epochs": 1
        }
      }
    }
  },
  "version": 12
}

Training

$ pytext train < top.json

Predict output using tutorial (key is "text")

$ pytext predict-py --model-file=/tmp/model.pt
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Install apex from https://github.com/NVIDIA/apex/.
Loading model from /tmp/model.pt
Loaded checkpoint...
Use config saved in snapshot
Creating task: SemanticParsingTask...
Skipped initializing tensorizers since they are loaded from a previously saved state.
Loading model from model state dict...
Loaded!
please input a json example, the names should be the same with column_to_read in model training config:
{"text": "order coffee from starbucks"}
Traceback (most recent call last):
  File "/home/user/pytext/pytext/pytext_venv/bin/pytext", line 11, in <module>
    load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/pytext/pytext/pytext/main.py", line 447, in predict_py
    pprint.pprint(task.predict([json.loads(line)])[0])
  File "/home/user/pytext/pytext/pytext/task/new_task.py", line 233, in predict
    _, inputs = next(pad_and_tensorize_batches(self.data.tensorizers, batches))
  File "/home/user/pytext/pytext/pytext/data/data.py", line 157, in pad_and_tensorize_batches
    for raw_batch, numberized_batch in batches:
  File "/home/user/pytext/pytext/pytext/data/data.py", line 140, in batchify
    for super_pool in self._group_iter(iterable, super_pool_size, None):
  File "/home/user/pytext/pytext/pytext/data/data.py", line 78, in _group_iter
    for group in itertools.zip_longest(*iterators):
  File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in numberize_rows
    for name, tensorizer in self.tensorizers.items()
  File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in <dictcomp>
    for name, tensorizer in self.tensorizers.items()
  File "/home/user/pytext/pytext/pytext/data/tensorizers.py", line 454, in numberize
    tokens, start_idx, end_idx = self._lookup_tokens(row[self.text_column])
KeyError: 'tokenized_text'
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
$

Predict output using column_to_read (key is "tokenized_text")

$ pytext predict-py --model-file=/tmp/model.pt
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Install apex from https://github.com/NVIDIA/apex/.
Loading model from /tmp/model.pt
Loaded checkpoint...
Use config saved in snapshot
Creating task: SemanticParsingTask...
Skipped initializing tensorizers since they are loaded from a previously saved state.
Loading model from model state dict...
Loaded!
please input a json example, the names should be the same with column_to_read in model training config:
{"tokenized_text": "order coffee"}
Traceback (most recent call last):
  File "/home/user/pytext/pytext/pytext_venv/bin/pytext", line 11, in <module>
    load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/pytext/pytext/pytext/main.py", line 447, in predict_py
    pprint.pprint(task.predict([json.loads(line)])[0])
  File "/home/user/pytext/pytext/pytext/task/new_task.py", line 233, in predict
    _, inputs = next(pad_and_tensorize_batches(self.data.tensorizers, batches))
  File "/home/user/pytext/pytext/pytext/data/data.py", line 157, in pad_and_tensorize_batches
    for raw_batch, numberized_batch in batches:
  File "/home/user/pytext/pytext/pytext/data/data.py", line 140, in batchify
    for super_pool in self._group_iter(iterable, super_pool_size, None):
  File "/home/user/pytext/pytext/pytext/data/data.py", line 78, in _group_iter
    for group in itertools.zip_longest(*iterators):
  File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in numberize_rows
    for name, tensorizer in self.tensorizers.items()
  File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in <dictcomp>
    for name, tensorizer in self.tensorizers.items()
  File "/home/user/pytext/pytext/pytext/data/tensorizers.py", line 1578, in numberize
    annotation = Annotation(row[self.column])
KeyError: 'seqlogical'
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0