gniemann / webcrawler

Team Gamma Graphical Web Crawler

Home Page:https://gammacrawler.appspot.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

We have created a graphical web crawler that allows users to create an interactive map to explore the interconnections (hyperlinks) among web pages. Users interact with this service by entering a URL from which to start exploring and other parameters in a form. Users can explore via either a breadth first or depth first search from their URL of interest. Upon submitting the form information, the web crawler generates a map, and it is dynamically constructed within a grid map area for the user to view. The grid itself allows users to zoom in and out, drag the grid around, and drag individual map nodes. When a user highlights one of the map nodes the URL the node represents appears. Users can click on these nodes to open these URLS in a separate window.

The features of the Graphical Web Crawler are:

  1. Front-end client-side user interface that provides the user the ability to specify a starting URL and specify a depth-first or breadth-first crawl, as well as a numeric limit to terminate the crawl.
  2. Back-end server-side crawler that performs the requested crawl.
  3. Back-end transmits results to the front-end, which displays them graphically for the user to inspect.
  4. The URLs of the crawled pages/nodes will be displayed, and the user may click them to navigate to them in a new tab or window.
  5. The option to provide a keyword that the back-end crawler will use as a sentinel to end the crawl, i.e. prior to reaching the numeric limit.
  6. The client-side user interface should use cookies to store the previous starting pages, if the user wishes to re-crawl them.
  7. UI will build and display the graph in real-time.
  8. Graph will use a physics simulation to organize itself interactively, in real-time.
  9. Nodes will display the site favicon, if available.

The working system is hosted on Google App Engine and available at https://gammacrawler.appspot.com

About

Team Gamma Graphical Web Crawler

https://gammacrawler.appspot.com


Languages

Language:JavaScript 94.5%Language:Python 4.8%Language:HTML 0.4%Language:CSS 0.2%