shubham0204 / full-text-search

Full Text Search built for PDFs, DOCX using Inverted Index in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Full Text Search On Local Files With Inverted Index

banner

Demo

demo_gif

Features

  • Text extraction from PDFs, Microsoft Word DOCX and text-based formats
  • Disk-persistence of inverted index
  • Validation of inverted index
  • Command-line utility

Setup

Make sure Java is installed on your system, with JAVA_HOME pointing to a JDK installation. You may clone the project from the GitHub repository, and build it with gradlew present in the root of the repository,

$> git clone https://github.com/shubham0204/full-text-search
$> cd full-text-search
$> ./gradlew build

To execute tests,

$> ./gradlew test

To build the fat/uber JAR,

$> ./gradlew shadowJar

Usage

Index

$> java -jar fulltextsearch.jar index build [dir]
$> java -jar fulltextsearch.jar index info [dir]
$> java -jar fulltextsearch.jar index rm [dir]

Use fulltextsearch index --help for description of each command.

Query

$> fulltextsearch query [dir]

Dependencies

Useful Resources

About

Full Text Search built for PDFs, DOCX using Inverted Index in Java

License:Apache License 2.0


Languages

Language:Java 100.0%