RasaHQ / rasa-nlu-examples

This repository contains examples of custom components for educational purposes.

Home Page:https://RasaHQ.github.io/rasa-nlu-examples/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SparseBytePairFeaturizer for hindi language is still asking for en model

sids07 opened this issue · comments

I was trying to apply sparsebytefeaturizer for Hindi language and given the only cache_dir then model for Hindi language is downloaded but after download, it searches for English language model on the cache_dir which obviously is not present there so, it throws no file found error.

my config.yml:

language: hi

pipeline:
  - name: WhitespaceTokenizer
  - name: rasa_nlu_examples.featurizers.dense.GensimFeaturizer
    cache_dir: /home/sid/Desktop/treeleaf/chatbot/embed
    file: hi_gen.kv
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: rasa_nlu_examples.featurizers.sparse.SparseBytePairFeaturizer
    lang: hi
    vs: 1000
    cache_dir: /home/sid/Desktop/treeleaf/chatbot/cache_dir
    model_file: /home/sid/Desktop/treeleaf/chatbot/cache_dir/hi/hi.wiki.bpe.vs1000.model
  - name: DIETClassifier
    random_seed: 42
    intent_classification: True
    entity_recognition: False
    use_masked_language_model: False
    epochs: 300
policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: RulePolicy

my file directories have cache_dir folder and it subfolders as:
..
...
cache_dir:
-- hi:
---- hi.wiki.bpe.vs1000.model
---- hi.wiki.bpe.vs1000.d25.w2v.bin
...
..

Error Message:

Traceback (most recent call last):
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/bin/rasa", line 8, in
sys.exit(main())
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/main.py", line 116, in main
cmdline_arguments.func(cmdline_arguments)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/cli/train.py", line 58, in
train_parser.set_defaults(func=lambda args: train(args, can_exit=True))
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/cli/train.py", line 102, in train
finetuning_epoch_fraction=args.epoch_fraction,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 109, in train
loop,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/utils/common.py", line 308, in run_in_loop
result = loop.run_until_complete(f)
File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 174, in train_async
finetuning_epoch_fraction=finetuning_epoch_fraction,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 305, in _train_async_internal
finetuning_epoch_fraction=finetuning_epoch_fraction,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 818, in _train_nlu_with_validated_data
**additional_arguments,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/train.py", line 98, in train
nlu_config, component_builder, model_to_finetune=model_to_finetune
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/model.py", line 163, in init
self.pipeline = self._build_pipeline(cfg, component_builder)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/model.py", line 174, in _build_pipeline
component = component_builder.create_component(component_cfg, cfg)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/components.py", line 852, in create_component
component = registry.create_component_by_config(component_config, cfg)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/registry.py", line 193, in create_component_by_config
return component_class.create(component_config, config)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/components.py", line 525, in create
return cls(component_config)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa_nlu_examples/featurizers/sparse/sparse_bpemb_featurizer.py", line 384, in init
self.spm = spm.SentencePieceProcessor(model_file=str(model_fp))
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 218, in Init
self.Load(model_file=model_file, model_proto=model_proto)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
OSError: Not found: "/home/sid/Desktop/treeleaf/chatbot/cache_dir/hi/en.wiki.bpe.vs1000.model": No such file or directory Error #2

Thanks for the issue, @m-vdb will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

Just to confirm, could you try;

- name: rasa_nlu_examples.featurizers.SparseBytePairFeaturizer
  lang: hi
  vs: 1000

The cached use-case is more for folks who want to pre-build docker containers. If you don't pass a folder it should automatically fetch the file if it doesn't exist.

Also! I don't know what dataset you're running this on, but if you have a representative dataset I'd love to hear if these tools increase the performance of your assistant.

@koaning i have tried the same which you are referring at my first try which automatically downloaded files for hindi languages under hi directory within cache_dir but still it asked for english language file.

Made a PR here: #140.

The PR should contain the fix, if it's still broken, feel free to re-open the issue!

it is still not working @koaning as per this PR made on #140 we still have to change it on line 379:

In current update:

model_fp = (
            Path(cache_dir)
            / self.component_config["lang"]
            / f"en.wiki.bpe.vs{self.component_config['vs']}.model"
        )

new changes to be made for working with other languages not english

model_fp = (
            Path(cache_dir)
            / self.component_config["lang"]
            / f"{self.component_config['lang']}.wiki.bpe.vs{self.component_config['vs']}.model"
        )