facebookresearch / pytext

A natural language modeling framework based on PyTorch

Home Page:https://pytext.readthedocs.io/en/master/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Attribute error during prediction step of hierarchical intent and slot filling example

ButteredGroove opened this issue · comments

Steps to reproduce

  1. Ubuntu 16.04, Python 3.7, CUDA 9.0 install
  2. pip3 install pytext-nlp
  3. git clone https://github.com/NVIDIA/apex
  4. cd apex
  5. pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
  6. cd ..
  7. wget https://fb.me/semanticparsingdialog
  8. unzip top-dataset-semantic-parsing.zip
  9. Grab yourself a copy of https://github.com/facebookresearch/pytext/blob/master/demo/configs/rnng.json
  10. Edit rnng.json to point "train_filename" to top-dataset-semantic-parsing/train.tsv
  11. Edit rnng.json to point "test_filename" to top-dataset-semantic-parsing/test.tsv
  12. Edit rnng.json to point "eval_filename" to top-dataset-semantic-parsing/eval.tsv
  13. Train a model. It'll take around an hour:
    pytext train < rnng.json
  14. Run predict step:
    pytext predict-py --model-file=/tmp/model.pt
  15. When prompted for a json example, try:
    {"text": "traffic in Los Angeles"}

Observed Results

$ pytext predict-py --model-file=/tmp/model.pt
Loading model from model.pt...
please input a json example, the names should be the same with column_to_read in model training config:
{"text": "traffic in Los Angeles"}
Traceback (most recent call last):
  File "/home/user/.local/bin/pytext", line 10, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/pytext/main.py", line 407, in predict_py
    pprint.pprint(task.predict([json.loads(line)])[0])
  File "/home/user/.local/lib/python3.7/site-packages/pytext/task/task.py", line 216, in predict
    model_inputs, context = self.data_handler.get_predict_iter(examples)
AttributeError: 'SemanticParsingTask' object has no attribute 'data_handler'

Expected Results

The expectation was to see predictions and scores.

Relevant Code

See above.

I forgot to mention: this is based on the hierarchical intent and slot filling tutorial.

pytext predict-py is currently broken. Thanks for reporting the issue, we're working on fixing it.

Actually, the related issue seems to be closed, are you still seeing the issue based on the latest code? #701

I had been installing from pip, not the repo. So, I did a fresh install of pytext following the instructions here: https://pytext.readthedocs.io/en/master/installation.html#install-from-source

pytest and pytest --cov worked fine.

However, when I run pytext I'm getting the following error:

$ pytext train < rnng.json
8<--- snip --->8
Traceback (most recent call last):
  File "/home/user/.local/bin/pytext", line 11, in <module>
    load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/.local/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/gpfs-volume/pytext/pytext/pytext/main.py", line 350, in train
    train_model(config, metric_channels=metric_channels)
  File "/gpfs-volume/pytext/pytext/pytext/workflow.py", line 89, in train_model
    config, dist_init_url, device_id, rank, world_size, metric_channels, metadata
  File "/gpfs-volume/pytext/pytext/pytext/workflow.py", line 125, in prepare_task
    config.task, metadata=metadata, rank=rank, world_size=world_size
  File "/gpfs-volume/pytext/pytext/pytext/task/task.py", line 43, in create_task
    world_size=world_size,
  File "/gpfs-volume/pytext/pytext/pytext/config/component.py", line 154, in create_component
    return cls.from_config(config, *args, **kwargs)
  File "/gpfs-volume/pytext/pytext/pytext/task/new_task.py", line 100, in from_config
    tensorizers, data = cls._init_tensorizers(config, tensorizers, rank, world_size)
  File "/gpfs-volume/pytext/pytext/pytext/task/new_task.py", line 142, in _init_tensorizers
    init_tensorizers=init_tensorizers,
  File "/gpfs-volume/pytext/pytext/pytext/config/component.py", line 154, in create_component
    return cls.from_config(config, *args, **kwargs)
  File "/gpfs-volume/pytext/pytext/pytext/data/data.py", line 243, in from_config
    **kwargs,
  File "/gpfs-volume/pytext/pytext/pytext/data/data.py", line 270, in __init__
    initialize_tensorizers(self.tensorizers, full_train_data)
  File "/gpfs-volume/pytext/pytext/pytext/data/tensorizers.py", line 1306, in initialize_tensorizers
    for row in data_source:
  File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 243, in _convert_raw_source
    example = self._read_example(row)
  File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 217, in _read_example
    example[name] = self.load(value, self.schema[name])
  File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 264, in load
    return converter(value)
  File "/gpfs-volume/pytext/pytext/pytext/data/sources/data_source.py", line 328, in load_json
    return json.loads(s)
  File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Exception ignored in: <generator object AnnotationNumberizer.initialize at 0x7fbd6004b9a8>
Traceback (most recent call last):
  File "/gpfs-volume/pytext/pytext/pytext/data/tensorizers.py", line 1211, in initialize
    self.shift_idx = self.vocab.idx[SHIFT]
KeyError: 'SHIFT'

I observed this issue with pytext v0.2.2 today.

Wondering if there's more news on this front? Thanks!

I split off the problem of the demo failing to train with v0.2.2 into it's own issue. Once that's fixed maybe we can revisit the issue here about predictions not working.

There is a fix concerning the training failing and a pull request to add it to master:
https://github.com/facebookresearch/pytext/pull/1151
But the predict is still failing. Any known solution?

Same problem with predict (AttributeError: 'SemanticParsingTask' object has no attribute 'data_handler') - Any hint on what the issue is?

Observed this issue with pytext v0.3.0 as well.

Still has error with current master branch (0.3.1) of pytext.

Install

$ git clone git@github.com:facebookresearch/pytext.git
$ cd pytext
$ python3 -m venv pytext_venv
$ source pytext_venv/bin/activate
$ pip install --upgrade pip
$ pip install torch
$ ./install_deps
$ export LANG=en_US.utf8
$ export LC_ALL=en_US.utf8

Config file (top.json)

{
  "task": {
    "SemanticParsingTask": {
      "data": {
        "batcher": {
          "PoolingBatcher": {
            "eval_batch_size": 1,
            "test_batch_size": 1,
            "train_batch_size": 1
          }
        },
        "source": {
          "TSVDataSource": {
            "field_names": ["text", "tokenized_text", "seqlogical"],
            "train_filename": "/home/user/top/train.tsv",
            "test_filename": "/home/user/top/test.tsv",
            "eval_filename": "/home/user/top/eval.tsv"
          }
        }
      },
      "model": {
        "lstm": {
          "dropout": 0.34,
          "lstm_dim": 16,
          "num_layers": 2,
          "bidirectional": true
        },
        "ablation": {
          "use_buffer": true,
          "use_stack": true,
          "use_action": true,
          "use_last_open_NT_feature": false
        },
        "constraints": {
          "intent_slot_nesting": true,
          "ignore_loss_for_unsupported": false,
          "no_slots_inside_unsupported": true
        },
        "max_open_NT": 10,
        "dropout": 0.34,
        "compositional_type": "sum"
      },
      "metric_reporter": {
        "text_column_name": "tokenized_text"
      },
      "trainer": {
        "real_trainer": {
          "report_train_metrics": false,
          "epochs": 1
        }
      }
    }
  },
  "version": 12
}

Training

$ pytext train < top.json

Predict output using tutorial (key is "text")

$ pytext predict-py --model-file=/tmp/model.pt
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Install apex from https://github.com/NVIDIA/apex/.
Loading model from /tmp/model.pt
Loaded checkpoint...
Use config saved in snapshot
Creating task: SemanticParsingTask...
Skipped initializing tensorizers since they are loaded from a previously saved state.
Loading model from model state dict...
Loaded!
please input a json example, the names should be the same with column_to_read in model training config:
{"text": "order coffee from starbucks"}
Traceback (most recent call last):
  File "/home/user/pytext/pytext/pytext_venv/bin/pytext", line 11, in <module>
    load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/pytext/pytext/pytext/main.py", line 447, in predict_py
    pprint.pprint(task.predict([json.loads(line)])[0])
  File "/home/user/pytext/pytext/pytext/task/new_task.py", line 233, in predict
    _, inputs = next(pad_and_tensorize_batches(self.data.tensorizers, batches))
  File "/home/user/pytext/pytext/pytext/data/data.py", line 157, in pad_and_tensorize_batches
    for raw_batch, numberized_batch in batches:
  File "/home/user/pytext/pytext/pytext/data/data.py", line 140, in batchify
    for super_pool in self._group_iter(iterable, super_pool_size, None):
  File "/home/user/pytext/pytext/pytext/data/data.py", line 78, in _group_iter
    for group in itertools.zip_longest(*iterators):
  File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in numberize_rows
    for name, tensorizer in self.tensorizers.items()
  File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in <dictcomp>
    for name, tensorizer in self.tensorizers.items()
  File "/home/user/pytext/pytext/pytext/data/tensorizers.py", line 454, in numberize
    tokens, start_idx, end_idx = self._lookup_tokens(row[self.text_column])
KeyError: 'tokenized_text'
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
$

Predict output using column_to_read (key is "tokenized_text")

$ pytext predict-py --model-file=/tmp/model.pt
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Install apex from https://github.com/NVIDIA/apex/.
Loading model from /tmp/model.pt
Loaded checkpoint...
Use config saved in snapshot
Creating task: SemanticParsingTask...
Skipped initializing tensorizers since they are loaded from a previously saved state.
Loading model from model state dict...
Loaded!
please input a json example, the names should be the same with column_to_read in model training config:
{"tokenized_text": "order coffee"}
Traceback (most recent call last):
  File "/home/user/pytext/pytext/pytext_venv/bin/pytext", line 11, in <module>
    load_entry_point('pytext-nlp', 'console_scripts', 'pytext')()
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/user/pytext/pytext/pytext_venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/pytext/pytext/pytext/main.py", line 447, in predict_py
    pprint.pprint(task.predict([json.loads(line)])[0])
  File "/home/user/pytext/pytext/pytext/task/new_task.py", line 233, in predict
    _, inputs = next(pad_and_tensorize_batches(self.data.tensorizers, batches))
  File "/home/user/pytext/pytext/pytext/data/data.py", line 157, in pad_and_tensorize_batches
    for raw_batch, numberized_batch in batches:
  File "/home/user/pytext/pytext/pytext/data/data.py", line 140, in batchify
    for super_pool in self._group_iter(iterable, super_pool_size, None):
  File "/home/user/pytext/pytext/pytext/data/data.py", line 78, in _group_iter
    for group in itertools.zip_longest(*iterators):
  File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in numberize_rows
    for name, tensorizer in self.tensorizers.items()
  File "/home/user/pytext/pytext/pytext/data/data.py", line 300, in <dictcomp>
    for name, tensorizer in self.tensorizers.items()
  File "/home/user/pytext/pytext/pytext/data/tensorizers.py", line 1578, in numberize
    annotation = Annotation(row[self.column])
KeyError: 'seqlogical'
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0
Destroying TSV object
Total number of rows read: 0
Total number of rows dropped: 0