alfredfrancis / ai-chatbot-framework

A python chatbot framework with Natural Language Understanding and Artificial Intelligence.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Language support

mancvso opened this issue · comments

What are the steps to change the language used in this software?

I've seen that https://spacy.io/models/ have

English
German
Spanish
Portuguese
French
Italian
Dutch
Multi-Language

And the Dockerfile has the instruction

RUN python -m spacy download en; python

But what I'm working out to understand is the wordnet and others:

What should I modify to change the language used in the conversations, models, etc?

I'm interested in Spanish, but happily would update the README with instructions for other languages.

So far, the relevant bits for spanish are

diff --git a/app/nlu/classifiers/sklearn_intent_classifer.py b/app/nlu/classifiers/sklearn_intent_classifer.py
index 84be268..f3b2b1e 100755
--- a/app/nlu/classifiers/sklearn_intent_classifer.py
+++ b/app/nlu/classifiers/sklearn_intent_classifer.py
@@ -17,7 +17,7 @@ class SklearnIntentClassifier:
 
         self.model = None
 
-        self.spacynlp = spacy.load('en')
+        self.spacynlp = spacy.load('es')
 
         self.stopwords = set(stopwords.words('english') +
                              ["n't", "'s", "'m", "ca"] +
diff --git a/app/nlu/classifiers/starspace_intent_classifier.py b/app/nlu/classifiers/starspace_intent_classifier.py
index f334b98..058e327 100644
--- a/app/nlu/classifiers/starspace_intent_classifier.py
+++ b/app/nlu/classifiers/starspace_intent_classifier.py
@@ -110,7 +110,7 @@ class EmbeddingIntentClassifier:
         self.intent_placeholder = intent_placeholder
         self.embedding_placeholder = embedding_placeholder
         self.similarity_op = similarity_op
-        self.nlp = spacy.load('en')
+        self.nlp = spacy.load('es')
         self.vect = vectorizer
         self.use_word_vectors = use_word_vectors
 
diff --git a/app/nlu/classifiers/tf_intent_classifer.py b/app/nlu/classifiers/tf_intent_classifer.py
index bfc5dd9..0b972db 100755
--- a/app/nlu/classifiers/tf_intent_classifer.py
+++ b/app/nlu/classifiers/tf_intent_classifer.py
@@ -17,7 +17,7 @@ class TfIntentClassifier:
 
     def __init__(self):
         self.model = None
-        self.nlp = spacy.load('en')
+        self.nlp = spacy.load('es')
         self.label_encoder = LabelBinarizer()
         self.graph = None
 
diff --git a/dockerfile b/dockerfile
index 1701892..64162e0 100644
--- a/dockerfile
+++ b/dockerfile
@@ -6,13 +6,15 @@ COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt
 
 RUN python -m nltk.downloader "averaged_perceptron_tagger"; python
+RUN python -m nltk.downloader "spanish_grammars"; python
 RUN python -m nltk.downloader "punkt"; python
 RUN python -m nltk.downloader "stopwords"; python
 RUN python -m nltk.downloader "wordnet"; python
-RUN python -m spacy download en; python
+RUN python -m nltk.downloader "perluniprops"; python
+RUN python -m spacy download es; python
 
 EXPOSE 8080
 
 COPY . .
 
-CMD ["make","run_docker"]
\ No newline at end of file
+CMD ["make","run_docker"]

Hi, I've managed to replace stopwords and use spacy models for my language (es).

Why, instead of closing this issue we do some documentation about it. I can even work in a PR to enable some kind of parameter to specify language.