infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

如何解决同音字的问题

vvip2u opened this issue · comments

commented

问题描述

记录中有刘德华,不想搜【柳】的时候,出现刘德华被命中的情况

Action

  • 自问自答,如果有更好的解决方法,大家可以写在评论区
  • 我的方案
    • 可以做到搜索不同同音字
    • 后续优化:高亮该高亮的部分
commented

/### 临时方案1

前提

需要写一个搜索针对名字的搜索功能
名字假如是:刘德华
可以根据以下其中之一进行搜索:刘德华,liu,de,hua,刘,德,华
P.S. 只支持搜索,暂时不支持高亮

具体步骤

步骤一 create index

PUT /test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "chinese_analyzer": {
          "tokenizer": "chinese_chars_tokenizer"
        },
        "pinyin_analyzer": {
          "tokenizer": "pinyin_tokenizer"
        }
      },
      "tokenizer": {
        "chinese_chars_tokenizer": {
          "type": "pinyin",
          "keep_first_letter": false,
          "keep_separate_first_letter": false,
          "keep_full_pinyin": false,
          "keep_original": false,
          "limit_first_letter_length": 50,
          "keep_separate_chinese": true,
          "lowercase": true
        },
        "pinyin_tokenizer": {
          "type": "pinyin",
          "keep_first_letter": false,
          "keep_separate_first_letter": false,
          "keep_full_pinyin": true,
          "keep_original": false,
          "limit_first_letter_length": 50,
          "keep_separate_chinese": true,
          "lowercase": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "search_analyzer": "chinese_analyzer",
        "analyzer": "pinyin_analyzer"
      }
    }
  }
}

步骤二

PUT test_index/_doc/1
{
"name": "刘德华"
}

步骤三

验证1
GET /test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "name": "刘"
    }
  }
}
验证2
GET /test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "name": "liu"
    }
  }
}
验证3
GET /test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "name": "刘德华"
    }
  }
}