theimberger / wikipedia_crawler

A data visualization tool which traverses links in wikipedia to find connections between a start and end point

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wikipedia Crawler

Check Out the Live Demo

About

Wikipedia Crawler is a bot which finds a path between two pages on Wikipedia as given by a user. This path is created by selecting sequential links on each page. The path is then visualized as a node tree. Note: there is a setTimeout on API queries for wikipedia in the lifecycle file (line 153) because without this short break the app can feel pretty jarring.

Notable Code

The Crawler uses a custom data structure called a PolyHash (see this file in the javascript/modules folder) to ensure constant speed look-up and addition. Notable methods for this structure are get() and add() shown below.

class PolyHash{

  ...

  add(title, parent = this.currentParent, children = [], image = false) {
    if (this.origin === "") {
      this.origin = title;
      this.currentParent = title;
    }

    let addition = {
      title,
      parent,
      children,
      image,
    };

    let bucket = Math.floor(this.hashString(title) % this.map.length);
    if (this.map[bucket] === null) {
      this.map[bucket] = [];
    }
    this.map[bucket].push(addition);
    this.count ++;
    if (this.count > this.map.length) {
      this.resizeMap();
    }
  }

  ...

  get(string) {
    let bucket = Math.floor(this.hashString(string) % this.map.length);
    let match = false;

    if (this.map[bucket] === null) return false;

    this.map[bucket].forEach((node) => {
      if (node.title === string) {
        match = node;
      }
    });

    return match;
  }

  ...

}

Other notable code includes the RunFactory method, which manages AJAX requests. Queries are stored in the FetchQue and then are fired off sequentially using a setTimeout.

  const RunFactory = (title) => () => {
    LinkMap.currentParent = title;
    AjaxUtils.fetchWikiPage(title, Run);
    FetchQue.shift();
  };

About

A data visualization tool which traverses links in wikipedia to find connections between a start and end point


Languages

Language:JavaScript 99.3%Language:CSS 0.4%Language:HTML 0.2%