Language support

Question

Language support

mancvso opened this issue 6 years ago · comments

Alejandro Vidal Castillo commented 6 years ago

What are the steps to change the language used in this software?

I've seen that https://spacy.io/models/ have

English
German
Spanish
Portuguese
French
Italian
Dutch
Multi-Language

And the Dockerfile has the instruction

RUN python -m spacy download en; python

But what I'm working out to understand is the wordnet and others:

What should I modify to change the language used in the conversations, models, etc?

I'm interested in Spanish, but happily would update the README with instructions for other languages.

Alejandro Vidal Castillo · Answer 1 · Fri Aug 17 2018 03:47:16 GMT+0800 (China Standard Time)

So far, the relevant bits for spanish are

diff --git a/app/nlu/classifiers/sklearn_intent_classifer.py b/app/nlu/classifiers/sklearn_intent_classifer.py
index 84be268..f3b2b1e 100755
--- a/app/nlu/classifiers/sklearn_intent_classifer.py
+++ b/app/nlu/classifiers/sklearn_intent_classifer.py
@@ -17,7 +17,7 @@ class SklearnIntentClassifier:
 
         self.model = None
 
-        self.spacynlp = spacy.load('en')
+        self.spacynlp = spacy.load('es')
 
         self.stopwords = set(stopwords.words('english') +
                              ["n't", "'s", "'m", "ca"] +
diff --git a/app/nlu/classifiers/starspace_intent_classifier.py b/app/nlu/classifiers/starspace_intent_classifier.py
index f334b98..058e327 100644
--- a/app/nlu/classifiers/starspace_intent_classifier.py
+++ b/app/nlu/classifiers/starspace_intent_classifier.py
@@ -110,7 +110,7 @@ class EmbeddingIntentClassifier:
         self.intent_placeholder = intent_placeholder
         self.embedding_placeholder = embedding_placeholder
         self.similarity_op = similarity_op
-        self.nlp = spacy.load('en')
+        self.nlp = spacy.load('es')
         self.vect = vectorizer
         self.use_word_vectors = use_word_vectors
 
diff --git a/app/nlu/classifiers/tf_intent_classifer.py b/app/nlu/classifiers/tf_intent_classifer.py
index bfc5dd9..0b972db 100755
--- a/app/nlu/classifiers/tf_intent_classifer.py
+++ b/app/nlu/classifiers/tf_intent_classifer.py
@@ -17,7 +17,7 @@ class TfIntentClassifier:
 
     def __init__(self):
         self.model = None
-        self.nlp = spacy.load('en')
+        self.nlp = spacy.load('es')
         self.label_encoder = LabelBinarizer()
         self.graph = None
 
diff --git a/dockerfile b/dockerfile
index 1701892..64162e0 100644
--- a/dockerfile
+++ b/dockerfile
@@ -6,13 +6,15 @@ COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt
 
 RUN python -m nltk.downloader "averaged_perceptron_tagger"; python
+RUN python -m nltk.downloader "spanish_grammars"; python
 RUN python -m nltk.downloader "punkt"; python
 RUN python -m nltk.downloader "stopwords"; python
 RUN python -m nltk.downloader "wordnet"; python
-RUN python -m spacy download en; python
+RUN python -m nltk.downloader "perluniprops"; python
+RUN python -m spacy download es; python
 
 EXPOSE 8080
 
 COPY . .
 
-CMD ["make","run_docker"]
\ No newline at end of file
+CMD ["make","run_docker"]

Alejandro Vidal Castillo · Answer 2 · Wed Sep 26 2018 05:20:14 GMT+0800 (China Standard Time)

Hi, I've managed to replace stopwords and use spacy models for my language (es).

Why, instead of closing this issue we do some documentation about it. I can even work in a PR to enable some kind of parameter to specify language.