devmukul44 / search-suggestion-elasticsearch

Elasticsearch Flask Application written in Python to implement Search Suggestions, leveraging Fuzzy Search in elasticsearch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Search Suggestions in Elasticsearch

Elasticsearch Flask Application written in Python to implement Search Suggestions, leveraging Fuzzy Search in elasticsearch.

Download Elasticsearch 2.4

Flask Application Installation

  • Python 2.7 and Flask 1.0.2 is used for development
  • Create virtual environment virtualenv venv; source venv/bin/activate
  • Install all the requirements pip install -r requirements.txt
  • To execute flask app python

Flask Application Structure

  • (starting point, contains flask routes information)
  • controller
    • (scroll api is used for querying es)
    • (bulk api is used indexing es)
    • (contains default elasticsearch configuration)
  • static
    • es_mapping.json (contains ed index mapping)
    • sample.learn.logs.2016.json (raw file containing all the json documents)
  • templates
    • index.html

Flask Routes

  • By default flask application starts at
  • 2 Routes are present in application
  • '/' -> Default route, contains text box for query
  • '/search/' -> GET searchtext, fuzzy match SearchText in Elasticsearch using scroll api, Renders index.html with Elasticsearch results using Jinja2

Indexing Data to Elasticsearch(

All the columns present in raw files are pushed (indexed) to elasticsearch but analysis is performed only on search_term, all other columns are not_analyzed during indexing. Also only documents with result_type -> SR are indexed.

To Index data follow steps:

  • Go to controller cd <search_suggestion>/controller/
  • Execute python script python

Data is Indexed using Elasticsearch Bulk API

Querying Data In Elasticsearch(

  • Fuzzy search Elasticsearch for the provided search text (T)
  • Uses Elasticsearch Scroll API to get all the matching documents
  • preserve_order is set to True, Documents are sorted in descending order of Match Score (V) [Assumption]
  • Internally uses Damerau-Levenshtein Edit Distance to achieve Fuzzy search
  • For Search Text T greater than 2 characters, Output Results are at most 2 Edit Distance away from Search Text.

Elasticsearch mapping(es_mapping.json):


The objective of this step is to convert or transform the document into an inverted index and store it into a shard segment.


2 Types:

  • Build in
    • Simple analyzer
    • Whitespace analyzer
  • Custom

In this project Custom analyzer is used during mapping.

Consists of two steps:

  • Tokenization:
    • Standard (splitting words on white space)
    • NGram (Trigram is leveraged in this Project)
  • Filter
    • Removing stop words
    • Lowercasing
    • Stemming
    • Synonyms

Mapping Analysis Used in this project:

"analysis": {
      "analyzer": {
        "custom_analyzer": {
          "tokenizer": "ngram_tokenizer",
          "filter": [
      "filter": {
        "custom_english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        "custom_stemmer": {
          "type": "stemmer",
          "language": "english"
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "3",
          "max_gram": "3",
          "token_chars": [

Problem Statement

  • You are given a dataset named sample.learn.logs.2016.json.gz containing a sample of search queries on website from 2016, with the following characteristics:
    • Data is in json format
    • Each record lists a specific search term, together with some related meta data
  • Create a subset of this initial dataset as follows:
    • Retain only records where the result_type field is SR
    • Let this pruned dataset be called Q
  • Create a web UI for querying this search log data with the following characteristics
    • Frontend UI needs to have only a simple textbox, and optionally, a Submit button
    • User enters search text T in the text box
    • On pressing Return in the textbox or clicking the Submit button, your system should pass the search text T to a backend server
    • The backend server returns, to the frontend UI, the search volume V associated with all log search queries Q’ from Q that are prefixed with the search text T, subject to the constraints below:
      • Your system must be smart enough to incorporate basic fuzziness (case insensitive, punctuation insensitive) and forgive minor spelling mistakes in the prefix matching
      • If search text T contains 3+ characters, your system should return all search queries Q’ that are at most 2 edit distance away from T
    • The frontend UI displays all returned (Q’, V) in descending order of V


Elasticsearch Flask Application written in Python to implement Search Suggestions, leveraging Fuzzy Search in elasticsearch.


Language:Python 69.9%Language:HTML 27.9%Language:JavaScript 2.1%