SparseBytePairFeaturizer for hindi language is still asking for en model
sids07 opened this issue · comments
I was trying to apply sparsebytefeaturizer for Hindi language and given the only cache_dir then model for Hindi language is downloaded but after download, it searches for English language model on the cache_dir which obviously is not present there so, it throws no file found error.
my config.yml:
language: hi
pipeline:
- name: WhitespaceTokenizer
- name: rasa_nlu_examples.featurizers.dense.GensimFeaturizer
cache_dir: /home/sid/Desktop/treeleaf/chatbot/embed
file: hi_gen.kv
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: rasa_nlu_examples.featurizers.sparse.SparseBytePairFeaturizer
lang: hi
vs: 1000
cache_dir: /home/sid/Desktop/treeleaf/chatbot/cache_dir
model_file: /home/sid/Desktop/treeleaf/chatbot/cache_dir/hi/hi.wiki.bpe.vs1000.model
- name: DIETClassifier
random_seed: 42
intent_classification: True
entity_recognition: False
use_masked_language_model: False
epochs: 300
policies:
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 100
- name: RulePolicy
my file directories have cache_dir folder and it subfolders as:
..
...
cache_dir:
-- hi:
---- hi.wiki.bpe.vs1000.model
---- hi.wiki.bpe.vs1000.d25.w2v.bin
...
..
Error Message:
Traceback (most recent call last):
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/bin/rasa", line 8, in
sys.exit(main())
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/main.py", line 116, in main
cmdline_arguments.func(cmdline_arguments)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/cli/train.py", line 58, in
train_parser.set_defaults(func=lambda args: train(args, can_exit=True))
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/cli/train.py", line 102, in train
finetuning_epoch_fraction=args.epoch_fraction,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 109, in train
loop,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/utils/common.py", line 308, in run_in_loop
result = loop.run_until_complete(f)
File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 174, in train_async
finetuning_epoch_fraction=finetuning_epoch_fraction,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 305, in _train_async_internal
finetuning_epoch_fraction=finetuning_epoch_fraction,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 818, in _train_nlu_with_validated_data
**additional_arguments,
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/train.py", line 98, in train
nlu_config, component_builder, model_to_finetune=model_to_finetune
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/model.py", line 163, in init
self.pipeline = self._build_pipeline(cfg, component_builder)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/model.py", line 174, in _build_pipeline
component = component_builder.create_component(component_cfg, cfg)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/components.py", line 852, in create_component
component = registry.create_component_by_config(component_config, cfg)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/registry.py", line 193, in create_component_by_config
return component_class.create(component_config, config)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/components.py", line 525, in create
return cls(component_config)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa_nlu_examples/featurizers/sparse/sparse_bpemb_featurizer.py", line 384, in init
self.spm = spm.SentencePieceProcessor(model_file=str(model_fp))
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 218, in Init
self.Load(model_file=model_file, model_proto=model_proto)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
OSError: Not found: "/home/sid/Desktop/treeleaf/chatbot/cache_dir/hi/en.wiki.bpe.vs1000.model": No such file or directory Error #2
Just to confirm, could you try;
- name: rasa_nlu_examples.featurizers.SparseBytePairFeaturizer
lang: hi
vs: 1000
The cached use-case is more for folks who want to pre-build docker containers. If you don't pass a folder it should automatically fetch the file if it doesn't exist.
Also! I don't know what dataset you're running this on, but if you have a representative dataset I'd love to hear if these tools increase the performance of your assistant.
@koaning i have tried the same which you are referring at my first try which automatically downloaded files for hindi languages under hi directory within cache_dir but still it asked for english language file.
Gotya. I think I've indeed found the bug here https://github.com/RasaHQ/rasa-nlu-examples/blob/main/rasa_nlu_examples/featurizers/sparse/sparse_bpemb_featurizer.py#L367.
Made a PR here: #140.
The PR should contain the fix, if it's still broken, feel free to re-open the issue!
it is still not working @koaning as per this PR made on #140 we still have to change it on line 379:
In current update:
model_fp = (
Path(cache_dir)
/ self.component_config["lang"]
/ f"en.wiki.bpe.vs{self.component_config['vs']}.model"
)
new changes to be made for working with other languages not english
model_fp = (
Path(cache_dir)
/ self.component_config["lang"]
/ f"{self.component_config['lang']}.wiki.bpe.vs{self.component_config['vs']}.model"
)