Blazingly fast search for wikipedia
- Install libstudxml from official source code (this project uses version 1.0.1)
- Locate the file
libstudxml.a
from previous step, it's most likely present in/usr/local/lib
. If not, you can check the output of the previous installation and locate the directory. Once done, change its path in Makefile variable$(LIBSTUDXML_EXE)
- Remaining dependencies are directly installed via the Makefile. So, if you already have any of them installed, feel free to edit the Makefile accordingly.
make
: compiles bzip2, stemmer, indexer and searcher.make clean
: to clean all binaries from this project.make clean_full
: (DANGER) will clean all dependencies' binaries as well.
- Create a new directory
libstudbin
somewhere in your home/non-sudo directory. - When installing libstudxml, instead of
./configure; make; sudo make install
, do./configure /path/to/libstudbin; make; make install
- You should find a file
libstudxml.a
insidelibstudbin/lib
. In this project's Makefile, update the path toLIBSTUDXML_EXE
with this file's path. - Finally, replace
$(CXX)
with$(CXX) -I/path/to/libstudbin/include
in the Makefile in both runner and searcher targets.
You may need to append --std=c++14
on the compilation command.
Use src/index.sh and src/search.sh. Usage is in file comments.
These categories are currently supported: Title, Infobox, Body, Category, Links, and References.
To refer to these categories, use their 1st character in lower case. So, the following query: "t:egypt i:nile"
searches for pages with egypt
in their title and nile
in their infobox.
If you do not specify a category, it is assumed to be the body by default.
i:egypt
i:sachin
people
t:Arabic
c:cricket
e:anarchism
TODO
Document id starts from zero. Every thread gets assigned [BLOCK * tI, (BLOCK + 1) * tI]
document ids, where tI is the thread index (starting from zero).