jpWang / LiLT

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Config error in Multi-task Semantic Entity Recognition on XFUND

tomas-gajarsky opened this issue · comments

I am getting errors when trying to run Multi-task Semantic Entity Recognition on XFUND by following the instructions in the README. Specifically, the config initialisation on line 127 in run_xfun_ser.py is failing with the following error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 546, in get_config_dict
    resolved_config_file = cached_path(
  File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1402, in cached_path
    output_path = get_from_cache(
  File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1574, in get_from_cache
    r.raise_for_status()
  File "/opt/conda/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/lilt-infoxlm-base/resolve/main/config.json

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 527, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 570, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for 'lilt-infoxlm-base'. Make sure that:

- 'lilt-infoxlm-base' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'lilt-infoxlm-base' is the correct path to a directory containing a config.json file

- or 'main' is a valid git identifier (branch name, a tag name, or a commit id) that exists for this model name as listed on its model page on 'https://huggingface.co/models'

I have also tried to download the model from the provided OneCloud link and point the config_name argument to the config.json contained within the compressed file, however I am getting another error in that case:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 529, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 278, in __getitem__
    raise KeyError(key)
KeyError: 'liltrobertalike'

I had to change the init file and update the requirements as I was facing the same problem as in issue #32. The updated versions are:

datasets==2.7.1
transformers==4.11.3

Did something change or am I doing something wrong?

commented

Your transformers version might be too old. Just run pip install --upgrade transformers and it should work.

LiLT wasn't added to transformers until at least 4.24 I think

commented

Also, the correct model name from huggingface is SCUT-DLVCLab/lilt-infoxlm-base -> https://huggingface.co/SCUT-DLVCLab/lilt-infoxlm-base

Thank you @logan-markewich, I was able to load the pre-trained model after upgrading transformers to 4.25.1 and set the model_name_or_path argument to SCUT-DLVCLab/lilt-infoxlm-base. The requirements.txt file and the README should be updated accordingly.

commented

@tomas-gajarsky glad it worked!

In the future, I would probably recommend training using huggingface. It's a bit more work to set up, but this repo seems to be abandoned.