dueringa / WikiWalker

Given a wikipedia article, build a graph of article links

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WikiWalker

Given a Wikipedia article, build a graph of article links

Input can be any Wikipedia URL. The results can be stored in a JSON cache file. This cache file can be used in successive runs, whereas on each start the cache file is read, combined with the Wikipedia data, and stored again.

Building

CMake is used as build system. You need curl and (boost::program_options or getopt).

Note for Debian / Ubuntu(?) users

Debian ships an old version of UnitTest++, which also uses a custom pkg-config file Unfortunately, this is completely incompatible with e.g. Gentoo, and the current version on GitHub, see also here.

Building under Windows

Why would you want to do that?

You'll need

  • curl - build from source following the instructions in winbuild
  • Boost libraries from www.boost.org

By default, cmake wants to use dynamic boost libs, but by default, boost bootstrap builds static libs only. So you need to set Boost_USE_STATIC_LIBS=ON in addition to BOOST_ROOT in cmake.

Alternatively, have a look at vcpkg, a tool to build and install libraries under Windows, and make them findable by cmake.

Generating graphs with Graphviz

Since graphs can get very wide, it's recommended to unflatten the graph first:

unflatten -l5 file.dot | dot ...

Also, have a look at Gephi.

Where 5 is the "depth" the links get distributed to.

Formatting

The root directory contains a .clang-format, which can be used to reformat the source code with clang-format. Alernatively, use the cmake target clang-format-source.

About

Given a wikipedia article, build a graph of article links

License:MIT License


Languages

Language:C++ 94.7%Language:CMake 5.2%Language:Makefile 0.1%