The goal of the implementation is to scan an index database of documents to find locations of a target sequence of words (the request) and return the document IDs and the positions.
This implementation work was realised under the guidance of a global US-based search engine leader and further extended to improve performance. UPIS focuses on performance for queries for a chain of one to 5 words in an indexed document database where the average throughput and latency are considered as the performance criterions to evaluate the PIM architecture implementation.
This program was developped by UPMEM team. Reach us at contact@upmem.com if you would like more details about this implementation (workflow structure, benchmarks, etc.)
common
directory contains files common to Host and DPU codedpu
directory contains the DPU code (i.e., the code running on the memory)host
directory contains the Host code (i.e., running on the CPU)datasets
directory contains some datasets for testing / demotools
directory contains related utilities (e.g., the indexing program)
In order to build the program and tools, just type:
make
The following commands will run small integration tests:
make run
make check
To run a larger dataset, see the datasets/wikipedia
directory
Use the index builder program in the tools
directory to create an index for a new set of files.
Example:
./tools/index_builder/index_builder_cpp --dictionary_file_name=dict.txt --input_directory_name=files --nb_mrams=2560 --output_file_prefix=index --assign_strategy=file_size
See the index_builder_cpp
help command for details.