Language support
mancvso opened this issue · comments
What are the steps to change the language used in this software?
I've seen that https://spacy.io/models/ have
English
German
Spanish
Portuguese
French
Italian
Dutch
Multi-Language
And the Dockerfile has the instruction
RUN python -m spacy download en; python
But what I'm working out to understand is the wordnet and others:
What should I modify to change the language used in the conversations, models, etc?
I'm interested in Spanish, but happily would update the README with instructions for other languages.
So far, the relevant bits for spanish are
diff --git a/app/nlu/classifiers/sklearn_intent_classifer.py b/app/nlu/classifiers/sklearn_intent_classifer.py
index 84be268..f3b2b1e 100755
--- a/app/nlu/classifiers/sklearn_intent_classifer.py
+++ b/app/nlu/classifiers/sklearn_intent_classifer.py
@@ -17,7 +17,7 @@ class SklearnIntentClassifier:
self.model = None
- self.spacynlp = spacy.load('en')
+ self.spacynlp = spacy.load('es')
self.stopwords = set(stopwords.words('english') +
["n't", "'s", "'m", "ca"] +
diff --git a/app/nlu/classifiers/starspace_intent_classifier.py b/app/nlu/classifiers/starspace_intent_classifier.py
index f334b98..058e327 100644
--- a/app/nlu/classifiers/starspace_intent_classifier.py
+++ b/app/nlu/classifiers/starspace_intent_classifier.py
@@ -110,7 +110,7 @@ class EmbeddingIntentClassifier:
self.intent_placeholder = intent_placeholder
self.embedding_placeholder = embedding_placeholder
self.similarity_op = similarity_op
- self.nlp = spacy.load('en')
+ self.nlp = spacy.load('es')
self.vect = vectorizer
self.use_word_vectors = use_word_vectors
diff --git a/app/nlu/classifiers/tf_intent_classifer.py b/app/nlu/classifiers/tf_intent_classifer.py
index bfc5dd9..0b972db 100755
--- a/app/nlu/classifiers/tf_intent_classifer.py
+++ b/app/nlu/classifiers/tf_intent_classifer.py
@@ -17,7 +17,7 @@ class TfIntentClassifier:
def __init__(self):
self.model = None
- self.nlp = spacy.load('en')
+ self.nlp = spacy.load('es')
self.label_encoder = LabelBinarizer()
self.graph = None
diff --git a/dockerfile b/dockerfile
index 1701892..64162e0 100644
--- a/dockerfile
+++ b/dockerfile
@@ -6,13 +6,15 @@ COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
RUN python -m nltk.downloader "averaged_perceptron_tagger"; python
+RUN python -m nltk.downloader "spanish_grammars"; python
RUN python -m nltk.downloader "punkt"; python
RUN python -m nltk.downloader "stopwords"; python
RUN python -m nltk.downloader "wordnet"; python
-RUN python -m spacy download en; python
+RUN python -m nltk.downloader "perluniprops"; python
+RUN python -m spacy download es; python
EXPOSE 8080
COPY . .
-CMD ["make","run_docker"]
\ No newline at end of file
+CMD ["make","run_docker"]
Hi, I've managed to replace stopwords and use spacy models for my language (es).
Why, instead of closing this issue we do some documentation about it. I can even work in a PR to enable some kind of parameter to specify language.