aniket22n / Transliteration-based-search-engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transliteration-based-search-engine

  • Upload some files( pdf/ text/ image) which contain Hindi text then that text will be extracted using OCR ( optical character recognition ) tool and stored in the database.
        Mongodb is used to store file data, For every file it will create new document with two fields 
        1. file_name - to store name of the file  2. content - to store content of the file
  • Serach for Hindi string using English script, This system do Transliteration of English Script to Hindi script.
        Transliteration is the process of converting text from one script another script 
        example : "namaste" -->  "नमस्ते" (same pronunciation )
  • Then it will search for Hindi string in all documents present in database and give file_names, matching accuracy and matching content as result wherever string is best matched.

Setup

Demonstration

Transliteration.based.search.engine.mp4.mp4

Work-flow of system

Transliteration Based Search Engine Flowchart (1)

Reference

About


Languages

Language:Python 62.6%Language:HTML 21.0%Language:CSS 16.4%