This program implements a Command Line Interface (CLI) document search engine based on the Term Frequency-Inverse Document Frequency (tf-idf) algorithm. It searches for the occurrence of specific terms within documents and ranks them based on their relevance.
- Search Functionality: Enter a term or phrase to search for within the documents.
- tf-idf Calculation: Calculates tf-idf scores for the searched terms across documents.
- Sorting: Sorts documents based on tf-idf scores.
- Document Viewing: Displays the path of the most relevant document containing the searched term.
- Continuous Search: Allows for continuous searching without exiting the program.
- Input Term to Search: Enter the term or phrase you want to search for.
- Results Display: View the tf-idf scores for each document containing the searched term.
- Document Viewing: Access the most relevant document by clicking on the displayed path.
To run the program:
- Compile the code using a C compiler.
- Execute the compiled program.
- Follow the on-screen instructions to search for documents.
- C compiler (e.g., gcc)
- Standard C libraries (assert.h, stdio.h, stdlib.h, dirent.h, string.h)
$ ./search_engine
> Input term you want to search:
> algorithm
Make sure the documents you want to search are located in the specified directory ("a\" in this case).