Page Walker

Zero-configuration test automation tool.

Page Walker crawls a website using Google Chrome browser, analyze data from Developer tools and performs other tests.

The result is an interactive report - see example report.

Types of detected defects

Network errors while loading page
Requests to blacklisted domains (malware)
JavaScript runtime exceptions
DevTools Console errors
Broken links, including links to blacklisted domains
HTML/CSS validation errors - before and after JavaScript execution

Getting started

For quick start use stand-alone version. It's a compiled application with all dependencies. No installation or configuration needed (it works on reasonable defaults). Just run.

Prerequisites

Current Chrome/Chromium browser (or at least version 66) must be installed.
For using HTML validator Java 8 (or newer) is required. If you disable HTML validation, it works without Java. Download Java 8.

Stand-alone version

You don't need anything else. Download Page Walker, unpack and run.

Windows

Double-click page-walker and it will run in interactive mode. You will be prompted to provide website URL and some basic information.

You can also run it in console with arguments. Open command line (cmd, Powershell), navigate to directory with page-walker and run for example:

.\page-walker.exe -u http://example.com -p 5 --headless yes

On Windows, Chrome browser location is detected automatically.

Linux

Open terminal, navigate to directory with page-walker and run for example:

./page-walker -u http://example.com -p 5 --headless yes

On Linux, default Chrome browser is set to google-chrome. Maybe on your system is different and you need to provide this:

./page-walker -u http://example.com -p 5 --chrome-binary chromium-browser

HTML reports

Reports (test results) are saved to output directory. For example: output/2018-07-04_17-45-08_251870/report/index.html. For your convince you will notice file output/latest_report.html redirecting to the latest report.

Running from source code

Page Walker is a cross-platform Python application. You can run it on any system (Windows, Linux, Mac).

Requirements

Python (at least 2.7 or 3.5)
Python modules
- requests
- websocket-client
v.Nu validator

Installation

Install (and upgrade) Python modules

pip install --upgrade --user -r requirements.txt

Download v.Nu validator vnu.jar_[version].zip and unpack to lib directory (lib/vnu/vnu.jar)

Running

Open console, navigate to directory with page-walker.py and run for example:

python page-walker.py -u http://example.com -p 5

Features

This is not list of all features. See Configurable parameters table for more information.

Interactive HTML reports

The result of page-walker run is report like this. It's a stand-alone HTML/CSS/JavaScript page independent of Page Walker. You can browse it on local computer, send to someone, publish on your intranet. You can also easily integrate it with Jenkins (or other CI tools) - note output/latest_report.html file and --keep-previous option.

Headless mode

Running Chrome in headless mode keeps browser window invisible. You can run tests on remote command line out-of-the-box.

Validate HTML before/after JavaScript execution

Feature called 2-step HTML validation

First step: Validate raw HTML, original page source received from server.
Second step: Validate HTML code exported from DOM after JavaScript execution. It saves errors that did not appear on first step. On most websites JavaScript modify page structure and now you can check if these changes are correct. You will also see that possible errors are related to JavaScript, because they did not occur on raw HTML.

For HTML validation Nu Html Checker (v.Nu) is used. The same library as used by W3C Markup Validator.

Parallel test execution

Run multiple instances of same Chrome browser in parallel (or different versions if you wish). Just run multiple page-walker commands with different Chrome remote debugger port for every instance. If you run second instance with the same port number as already running, first instance will be stopped.

Scrolling page after load

This feature helps collect more data and detect more bugs. When page is loaded, scrolling to the bottom of page occurs. This action can fire more JavaScript events, AJAX requests, show more images. It you don't like this idea, you can disable it.

Custom list of pages

By default Page Walker is finding links to other pages within domain and visiting them. When you have specific list of pages to visit in specific order, you can use it. Save this list in text file, one page per line. Every page is relative to domain. For URL http://example.com/login domain is http://example.com and page is /login. Provide path to this file in argument --pages-list.

Pay attention to --list-only argument.

If set to yes: only pages from list will be visited and no other pages. Option maximum number of pages to visit has no effect.
If set to no: option maximum number of pages to visit is taken into account. After visiting pages from list, pages found automatically will be visited.

Login to restricted area

You can login in different ways:

Provide HTTP Basic authentication credentials
Fill in the login form using Initial actions
Add custom cookies with authentication data

Other features

Domain blacklist - simple malware detection
Custom cookies - add cookies to browser before test starts
Initial actions - set text, click, submit before test starts

Configuration

Interactive mode

You can run Page Walker without any parameters. It's the same as double-clicking stand-alone executable on Windows. You will be asked to provide URL of website to test, number of pages to visit and whether to run in headless mode.

3-level configuration

Hard-coded defaults. Java and Chrome binary are dynamic and depends on operating system. On Windows location of installed Chrome is auto-detected.
Configuration file config/main.ini. It overwrites hard-coded defaults. If parameter is empty or omitted, hard-coded default will be used. Use it to configure your environment: custom location of Chrome, Java, parameters that you do not plan to change.
Command line arguments. It overwrites hard-coded defaults and configuration file parameters. Run with -h parameter to see all available options.

Configurable parameters

Command_line argument	`config/main.ini`	Default value	Description
`-u` or `--url`	`start_url`		URL of first page to visit, with `http(s)://`.
`-p` or `--pages`	`max_number_pages`	10	Maximum number of pages to visit.
`-l` or `--headless`	`chrome_headless`	no	[yes/no] Run Chrome in headless mode.
`--pages-list`	`pages_list_file`		[Optional] File containing list of pages to visit. Pages relative to main domain, starting with `/`.
`--list-only`	`pages_list_only`	yes	[yes/no] Visit all and only pages from file, regardless of maximum number of pages value. Option has no effect if file with pages list is not set.
`--wait-after`	`wait_time_after_load`	1	Wait time (in seconds) after page was loaded (browser received `Load` event). Even after `Load` event the browser usually downloads some data so it's better to set it for 1-3 seconds.
`--scroll-after`	`scroll_after_load`	yes	[yes/no] Scroll to bottom of page after page was loaded, before wait time. While scrolling browser can receive more events and data to analyze.
`--keep-previous`	`keep_previous_data`	yes	[yes/no] Keep data from previous program run. If no, `output` directory will be cleaned before run.
`--window-size`	`window_size`	1366x768	Size of browser window.
`--close-on-finish`	`chrome_close_on_finish`	yes	[yes/no] Close browser after finish. For debugging you can set to no to interact with browser after app has finished running.
`--chrome-port`	`chrome_debugging_port`	9222	Chrome remote debugger port number. To run multiple tests in parallel set different port for every instance.
`--chrome-timeout`	`chrome_timeout`	30	Chrome connection timeout in seconds. How long to wait for `Load` event.
`--chrome-binary`	`chrome_binary`	Auto-detected on Windows, `google-chrome` otherwise.	Path to Chrome executable file.
`--chrome-ignore-cert`	`chrome_ignore_cert`	no	[yes/no] Ignore SSL errors. Set to yes if you want to test a website with invalid SSL certificate.
`--http-auth`	`http_basic_auth_data`		HTTP Basic authentication credentials in format `login:password`
`--cookies-file`	`custom_cookies_file`		Config file with custom cookies definition, more: Custom cookies
`--initial-actions-file`	`initial_actions_file`		Config file with initial actions definition, more: Initial actions
`--initial-actions-url`	`initial_actions_url`	the same as `start_url`	URL of the page on which to perform initial actions. Option has no effect if `initial_actions_file` was not set.
`--validate`	`validator_enabled`	yes	[yes/no] Enable HTML validator. Set to no if you don't have Java installed.
`--check-css`	`validator_check_css`	yes	[yes/no] Check also inline CSS. Option has no effect if HTML validator is disabled.
`--validator-warnings`	`validator_show_warnings`	yes	[yes/no] Report also warnings (in addition to errors). Option has no effect if HTML validator is disabled.
	`validator_vnu_jar`	`lib/vnu/vnu.jar`	Path to vNu HTML validator JAR file.
`--java-binary`	`java_binary`	`Java` on Windows, `java` otherwise.	Path to Java executable file. To use custom Java unpacked to `lib` directory, set for example to `lib/jre{ver}/bin/java`
	`java_stack_size`	4096	Java stack size [KB]. If you experience stack overflow errors while validating extremely large HTML page, increase it.
`--check-links`	`check_external_links`	yes	[yes/no] Check HTTP response status of external links.
	`check_external_links_timeout`	10	External links checking connection timeout in seconds.
`--domain-blacklist`	`domain_blacklist_enabled`	yes	[yes/no] Check if domains are blacklisted due to malware, scam.
	`domain_blacklist_cache_expiry`	24	Domain blacklist cache expiry in hours.
	`domain_blacklist_auto_update`	yes	Domain blacklist auto update.
	`domain_blacklist_url`	URL	URL with current domain lists.
`-v` or `--version`			Show app and Python version and exit.
`-h` or `--help`			Show all command line options.

Boolean values

For boolean values (yes/no) you can also use:

instead of yes: y, true, 1
instead of no: n, false, 0

Examples

Replace page-walker with:

.\page-walker.exe when running Windows stand-alone version
./page-walker when running Linux stand-alone version
python page-walker.py when running from source code

Visit max. 100 pages in headless mode

page-walker -u http://example.com/ -p 100 --headless yes

Visit all pages from list in file with no HTML validation

page-walker -u http://example.com/ --pages-list config/pages.txt --validate no

Run with custom Java (downloaded to lib directory) and Chrome (installed Beta version on Linux)

page-walker -u http://example.com/ --java-binary lib/jdk-10.0.1/bin/java --chrome-binary google-chrome-beta

Run 2 website tests in parallel (run commands in separate consoles)

page-walker -u http://example.com/ -p 100 --headless yes
page-walker -u http://example.org/ -p 100 --headless yes --chrome-port 9223

Limitations and known problems

ERROR: Unable to connect to Chrome remote debugger (see log file for details)

Chrome logs are saved to output/{date}_{time}/chrome_run.log. Probably there is some Chrome issue, not related to Page Walker. For debugging purposes run Chrome from console. If chrome_run.log is empty, maybe you are trying to run Chrome in non-headless mode on remote server without GUI.

ERROR: Chrome not found at location: google-chrome

On Linux google-chrome is default Chrome location. Add custom name for example: --chrome-binary chromium-browser. It may be necessary to specify the full path.

[FAIL] Network.getResponseBody | No resource with given identifier found

This error does not stop program, only appears in the console. It occurs when Chrome lost information about some response received previously. Usually caused by non HTTP redirection to other page.

net::ERR_ABORTED

You can see this error in HTML report. Sometimes lots of them. When using DevTools manually it's equivalent to (canceled) error in Network tab. The reasons may be different, and sometimes difficult to reproduce. Read this great topic on StackOverflow. This can be related to some page issues, so it's worth analyzing.

More information

TODO

What improvements you can expect in future releases.

Developer guide, more technical details.
Skipped errors, known errors you won't fix. Manually created list of errors that will not be reported.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

HTML report uses following libraries included in lib/pagewalker/report_template/lib/:

jQuery licensed under the MIT License, Copyright JS Foundation and other contributors
DataTables licensed under the MIT License, Copyright (C) 2008-2018, SpryMedia Ltd.