kwh44 / parallel_indexing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parallel indexing of text files

Description:

Indexes words from input data filename by alphabetical and usage count order.
Command line input: config_filename, which has:

  • input data filename (supports txt, zip, tar, gz, ar, etc.)
  • output filename by alphabet
  • output filename by count (from most occuring word to least one)
  • number of threads to utilize

Requirements

  • Boost.Locale >= 1.68.0
  • ICU >= 62.1
  • libarchive >= 3.3.3
  • Compiler supporting C++17 standard

How to build&run

  • mkdir build; cd build; cmake ..; make -j4
  • ./parallel_indexing <path_to_config_file>

© Created by Andrii Maistruk and Anatolii Iatsuk.

About


Languages

Language:C++ 80.4%Language:Python 13.8%Language:CMake 5.8%