Support for Swedish [sv-SE] OCR

Question

Support for Swedish [sv-SE] OCR

Yavari opened this issue 5 years ago · comments

Can you please add support for Swedish language or guide me to have I can do it so that I can add a pull request?

Payam Yavari · Answer 1 · Wed Aug 14 2019 17:33:54 GMT+0800 (China Standard Time)

Here is some code I am using in a another project. Please let me know if you want me to create a pull request.

    "ambar_sv": {
      "tokenizer": "standard",
      "filter": [
        "lowercase",
        "icu_folding_se",
        "swedish_stop",
        "swedish_stemmer"
      ],
	  
   "swedish_stemmer": {
      "type": "stemmer",
      "language": "swedish"
    },

    "swedish_stop": {
      "type": "stop",
      "stopwords": "_swedish_"
    },
   "icu_folding_se": {
      "type": "icu_folding",
      "unicodeSetFilter": "[^åäöÅÄÖ]"
    }

analysis-icu plugin needs to be installed for icu_folding.

    RUN bin/elasticsearch-plugin install analysis-icu

Payam Yavari · Answer 2 · Wed Aug 14 2019 18:53:57 GMT+0800 (China Standard Time)

I guess https://github.com/RD17/ambar/blob/master/Pipeline/Dockerfile also needs the following line:

tesseract-ocr-swe \

stale · Answer 3 · Thu Aug 29 2019 19:36:23 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.