Support for Swedish [sv-SE] OCR
Yavari opened this issue · comments
Payam Yavari commented
Can you please add support for Swedish language or guide me to have I can do it so that I can add a pull request?
Payam Yavari commented
Here is some code I am using in a another project. Please let me know if you want me to create a pull request.
"ambar_sv": {
"tokenizer": "standard",
"filter": [
"lowercase",
"icu_folding_se",
"swedish_stop",
"swedish_stemmer"
],
"swedish_stemmer": {
"type": "stemmer",
"language": "swedish"
},
"swedish_stop": {
"type": "stop",
"stopwords": "_swedish_"
},
"icu_folding_se": {
"type": "icu_folding",
"unicodeSetFilter": "[^åäöÅÄÖ]"
}
analysis-icu plugin needs to be installed for icu_folding.
RUN bin/elasticsearch-plugin install analysis-icu
Payam Yavari commented
I guess https://github.com/RD17/ambar/blob/master/Pipeline/Dockerfile also needs the following line:
tesseract-ocr-swe \
stale commented
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.