https://link.springer.com/chapter/10.1007/978-3-031-11644-5_61
Step by step instruction for how to run RASA NLU and Paraphraser to obtain results.
Pretrained models can be found in the models
folder.
The results of the various pretrained models are found in results
folder.
paraphraser.py
can be found in paraphraser_tool
folder.
AllYamlFiles
contains the necessary YAML files in RASA format. The RASA YAML format can be found in this link.
Refer to this link for possible RASA commands.
It is recommended to create a virtual environment with python version 3.7 and activate it before running the following:
pip install -r requirements.txt
Activate VirtualEnvironment in the folder
- split the training data into 90/10
$ rasa data split nlu --nlu data/nlu/appended_original_nlu.yml --training-fraction 9
-
rename the 10% test file to non_paraphrased_test_data_from_appended_original_nlu.yml
-
rename the 90% train file to non_paraphrased_training_data_from_appended_original_nlu.yml
- Train nlu model with non_paraphrased_training_data_from_appended_original_nlu.yml
$ rasa train --nlu --data data/nlu/non_paraphrased_training_data_from_appended_original_nlu.yml
- Paraphrase non_paraphrased_training_data_from_appended_original_nlu.yml by shifting the file to Pharaphraser folder, and convert it into XLSX format.
a. Convert Yaml to XLSX by using: https://www.convertcsv.com/yaml-to-csv.htm
b. Update nlu_test.xlsx by importing the converted data.
c. Run split.py to generate a new expanded.tsv file:
$ python split.py nlu_test.xlsx
e. Add "- " to nlu_examples_expanded by ="- "&<cell>
- Run the paraphrasing python script.
$ python run paraphraser.py
- Rename the resulting yaml file to paraphrased_training_data_from_non_paraphrased_training_data.yml
- Train nlu model with paraphrased_training_data_from_non_paraphrased_training_data.yml
$ rasa train --nlu --data data/nlu/paraphrased_test_data_from_paraphrased_appended_original_nlu.yml
- Paraphrase appended_original_nlu.yml by shifting the file to Paraphraser folder, and converting it into XLSX format.
a. Convert Yaml to XLSX by using: https://www.convertcsv.com/yaml-to-csv.htm
b. Update nlu_test.xlsx by importing the converted data.
c. Run split.py to generate a new expanded.tsv file:
$ python split.py nlu_test.xlsx
e. Add "- " to nlu_examples_expanded by ="- "&<cell>
- Run the paraphrasing python script.
$ python run paraphraser.py
-
Rename the resulting yaml file to paraphrased_appended_original_nlu.yml
-
Split the training data into 90/10
$ rasa data split nlu --nlu data/nlu/appended_original_nlu.yml --training-fraction 9
- Rename the 10% test file to paraphrased_test_data_from_paraphrased_appended_original_nlu.yml
- Test the model created in step 4. with non_paraphrased_test_data_from_appended_original_nlu.yml
$ rasa test nlu --nlu data/nlu/non_paraphrased_test_data_from_appended_original_nlu.yml --model model/<auto-generated-model-name-from-step-4.tar>
- Test the model created in step 4. with paraphrased_test_data_from_paraphrased_appended_original_nlu.yml
$ rasa test nlu --nlu data/nlu/paraphrased_test_data_from_paraphrased_appended_original_nlu.yml --model model/<auto-generated-model-name-from-step-4.tar>
- Test the model created in step 8. with non_paraphrased_test_data_from_appended_original_nlu.yml
$ rasa test nlu --nlu data/nlu/non_paraphrased_test_data_from_appended_original_nlu.yml --model model/<auto-generated-model-name-from-step-8.tar>
- Test the model created in step 8. with paraphrased_test_data_from_paraphrased_appended_original_nlu.yml
$ rasa test nlu --nlu data/nlu/paraphrased_test_data_from_paraphrased_appended_original_nlu.yml --model model/<auto-generated-model-name-from-step-8.tar>
The overview of training and test permutations can be seen in the following diagram:
After performing testing, the results can be found in results
folder.
The performance of the model can be found under the intent_report.json
file at the bottom of the json file.
For example:
"accuracy": 0.8537735849056604,
"macro avg": {
"precision": 0.801493710691824,
"recall": 0.8537735849056604,
"f1-score": 0.8179245283018868,
"support": 212
},
"weighted avg": {
"precision": 0.801493710691824,
"recall": 0.8537735849056604,
"f1-score": 0.8179245283018868,
"support": 212
}
Additionally, the confusion matrix and confidence distribution histograms are automaticall generated. For instance: