Zero-configuration test automation tool.
Page Walker crawls a website using Google Chrome browser, analyze data from Developer tools and performs other tests.
The result is an interactive report - see example report.
- Network errors while loading page
- Requests to blacklisted domains (malware)
- JavaScript runtime exceptions
- DevTools Console errors
- Broken links, including links to blacklisted domains
- HTML/CSS validation errors - before and after JavaScript execution
- Project homepage: rafal-qa.com/page-walker/
- Source code: github.com/rafal-qa/page-walker
- Stand-alone executables: github.com/rafal-qa/page-walker/releases
- Getting started
- Running from source code
- Features
- Configuration
- Limitations and known problems
- More information
- TODO
- License
- Acknowledgments
For quick start use stand-alone version. It's a compiled application with all dependencies. No installation or configuration needed (it works on reasonable defaults). Just run.
- Current Chrome/Chromium browser (or at least version 66) must be installed.
- For using HTML validator Java 8 (or newer) is required. If you disable HTML validation, it works without Java. Download Java 8.
You don't need anything else. Download Page Walker, unpack and run.
Double-click page-walker
and it will run in interactive mode. You will be prompted to provide website URL and some basic information.
You can also run it in console with arguments. Open command line (cmd, Powershell), navigate to directory with page-walker
and run for example:
.\page-walker.exe -u http://example.com -p 5 --headless yes
On Windows, Chrome browser location is detected automatically.
Open terminal, navigate to directory with page-walker
and run for example:
./page-walker -u http://example.com -p 5 --headless yes
On Linux, default Chrome browser is set to google-chrome
. Maybe on your system is different and you need to provide this:
./page-walker -u http://example.com -p 5 --chrome-binary chromium-browser
Reports (test results) are saved to output
directory. For example: output/2018-07-04_17-45-08_251870/report/index.html
. For your convince you will notice file output/latest_report.html
redirecting to the latest report.
Page Walker is a cross-platform Python application. You can run it on any system (Windows, Linux, Mac).
- Python (at least 2.7 or 3.5)
- Python modules
requests
websocket-client
- v.Nu validator
- Install (and upgrade) Python modules
pip install --upgrade --user -r requirements.txt
- Download v.Nu validator
vnu.jar_[version].zip
and unpack tolib
directory (lib/vnu/vnu.jar
)
Open console, navigate to directory with page-walker.py
and run for example:
python page-walker.py -u http://example.com -p 5
This is not list of all features. See Configurable parameters table for more information.
The result of page-walker
run is report like this. It's a stand-alone HTML/CSS/JavaScript page independent of Page Walker. You can browse it on local computer, send to someone, publish on your intranet. You can also easily integrate it with Jenkins (or other CI tools) - note output/latest_report.html
file and --keep-previous
option.
Running Chrome in headless mode keeps browser window invisible. You can run tests on remote command line out-of-the-box.
Feature called 2-step HTML validation
- First step: Validate raw HTML, original page source received from server.
- Second step: Validate HTML code exported from DOM after JavaScript execution. It saves errors that did not appear on first step. On most websites JavaScript modify page structure and now you can check if these changes are correct. You will also see that possible errors are related to JavaScript, because they did not occur on raw HTML.
For HTML validation Nu Html Checker (v.Nu) is used. The same library as used by W3C Markup Validator.
Run multiple instances of same Chrome browser in parallel (or different versions if you wish). Just run multiple page-walker
commands with different Chrome remote debugger port for every instance. If you run second instance with the same port number as already running, first instance will be stopped.
This feature helps collect more data and detect more bugs. When page is loaded, scrolling to the bottom of page occurs. This action can fire more JavaScript events, AJAX requests, show more images. It you don't like this idea, you can disable it.
By default Page Walker is finding links to other pages within domain and visiting them. When you have specific list of pages to visit in specific order, you can use it. Save this list in text file, one page per line. Every page is relative to domain. For URL http://example.com/login
domain is http://example.com
and page is /login
. Provide path to this file in argument --pages-list
.
Pay attention to --list-only
argument.
- If set to
yes
: only pages from list will be visited and no other pages. Option maximum number of pages to visit has no effect. - If set to
no
: option maximum number of pages to visit is taken into account. After visiting pages from list, pages found automatically will be visited.
You can login in different ways:
- Provide HTTP Basic authentication credentials
- Fill in the login form using Initial actions
- Add custom cookies with authentication data
- Domain blacklist - simple malware detection
- Custom cookies - add cookies to browser before test starts
- Initial actions - set text, click, submit before test starts
You can run Page Walker without any parameters. It's the same as double-clicking stand-alone executable on Windows. You will be asked to provide URL of website to test, number of pages to visit and whether to run in headless mode.
- Hard-coded defaults. Java and Chrome binary are dynamic and depends on operating system. On Windows location of installed Chrome is auto-detected.
- Configuration file
config/main.ini
. It overwrites hard-coded defaults. If parameter is empty or omitted, hard-coded default will be used. Use it to configure your environment: custom location of Chrome, Java, parameters that you do not plan to change. - Command line arguments. It overwrites hard-coded defaults and configuration file parameters. Run with
-h
parameter to see all available options.
Command_line argument | config/main.ini |
Default value | Description |
---|---|---|---|
-u or --url |
start_url |
URL of first page to visit, with http(s):// . |
|
-p or --pages |
max_number_pages |
10 | Maximum number of pages to visit. |
-l or --headless |
chrome_headless |
no | [yes/no] Run Chrome in headless mode. |
--pages-list |
pages_list_file |
[Optional] File containing list of pages to visit. Pages relative to main domain, starting with / . |
|
--list-only |
pages_list_only |
yes | [yes/no] Visit all and only pages from file, regardless of maximum number of pages value. Option has no effect if file with pages list is not set. |
--wait-after |
wait_time_after_load |
1 | Wait time (in seconds) after page was loaded (browser received Load event). Even after Load event the browser usually downloads some data so it's better to set it for 1-3 seconds. |
--scroll-after |
scroll_after_load |
yes | [yes/no] Scroll to bottom of page after page was loaded, before wait time. While scrolling browser can receive more events and data to analyze. |
--keep-previous |
keep_previous_data |
yes | [yes/no] Keep data from previous program run. If no, output directory will be cleaned before run. |
--window-size |
window_size |
1366x768 | Size of browser window. |
--close-on-finish |
chrome_close_on_finish |
yes | [yes/no] Close browser after finish. For debugging you can set to no to interact with browser after app has finished running. |
--chrome-port |
chrome_debugging_port |
9222 | Chrome remote debugger port number. To run multiple tests in parallel set different port for every instance. |
--chrome-timeout |
chrome_timeout |
30 | Chrome connection timeout in seconds. How long to wait for Load event. |
--chrome-binary |
chrome_binary |
Auto-detected on Windows, google-chrome otherwise. |
Path to Chrome executable file. |
--chrome-ignore-cert |
chrome_ignore_cert |
no | [yes/no] Ignore SSL errors. Set to yes if you want to test a website with invalid SSL certificate. |
--http-auth |
http_basic_auth_data |
HTTP Basic authentication credentials in format login:password |
|
--cookies-file |
custom_cookies_file |
Config file with custom cookies definition, more: Custom cookies | |
--initial-actions-file |
initial_actions_file |
Config file with initial actions definition, more: Initial actions | |
--initial-actions-url |
initial_actions_url |
the same as start_url |
URL of the page on which to perform initial actions. Option has no effect if initial_actions_file was not set. |
--validate |
validator_enabled |
yes | [yes/no] Enable HTML validator. Set to no if you don't have Java installed. |
--check-css |
validator_check_css |
yes | [yes/no] Check also inline CSS. Option has no effect if HTML validator is disabled. |
--validator-warnings |
validator_show_warnings |
yes | [yes/no] Report also warnings (in addition to errors). Option has no effect if HTML validator is disabled. |
validator_vnu_jar |
lib/vnu/vnu.jar |
Path to vNu HTML validator JAR file. | |
--java-binary |
java_binary |
Java on Windows, java otherwise. |
Path to Java executable file. To use custom Java unpacked to lib directory, set for example to lib/jre{ver}/bin/java |
java_stack_size |
4096 | Java stack size [KB]. If you experience stack overflow errors while validating extremely large HTML page, increase it. | |
--check-links |
check_external_links |
yes | [yes/no] Check HTTP response status of external links. |
check_external_links_timeout |
10 | External links checking connection timeout in seconds. | |
--domain-blacklist |
domain_blacklist_enabled |
yes | [yes/no] Check if domains are blacklisted due to malware, scam. |
domain_blacklist_cache_expiry |
24 | Domain blacklist cache expiry in hours. | |
domain_blacklist_auto_update |
yes | Domain blacklist auto update. | |
domain_blacklist_url |
URL | URL with current domain lists. | |
-v or --version |
Show app and Python version and exit. | ||
-h or --help |
Show all command line options. |
For boolean values (yes/no) you can also use:
- instead of
yes
:y
,true
,1
- instead of
no
:n
,false
,0
Replace page-walker
with:
.\page-walker.exe
when running Windows stand-alone version./page-walker
when running Linux stand-alone versionpython page-walker.py
when running from source code
Visit max. 100 pages in headless mode
page-walker -u http://example.com/ -p 100 --headless yes
Visit all pages from list in file with no HTML validation
page-walker -u http://example.com/ --pages-list config/pages.txt --validate no
Run with custom Java (downloaded to lib
directory) and Chrome (installed Beta version on Linux)
page-walker -u http://example.com/ --java-binary lib/jdk-10.0.1/bin/java --chrome-binary google-chrome-beta
Run 2 website tests in parallel (run commands in separate consoles)
page-walker -u http://example.com/ -p 100 --headless yes
page-walker -u http://example.org/ -p 100 --headless yes --chrome-port 9223
Chrome logs are saved to output/{date}_{time}/chrome_run.log
. Probably there is some Chrome issue, not related to Page Walker. For debugging purposes run Chrome from console. If chrome_run.log
is empty, maybe you are trying to run Chrome in non-headless mode on remote server without GUI.
On Linux google-chrome
is default Chrome location. Add custom name for example: --chrome-binary chromium-browser
. It may be necessary to specify the full path.
This error does not stop program, only appears in the console. It occurs when Chrome lost information about some response received previously. Usually caused by non HTTP redirection to other page.
You can see this error in HTML report. Sometimes lots of them. When using DevTools manually it's equivalent to (canceled)
error in Network tab. The reasons may be different, and sometimes difficult to reproduce. Read this great topic on StackOverflow. This can be related to some page issues, so it's worth analyzing.
What improvements you can expect in future releases.
- Developer guide, more technical details.
- Skipped errors, known errors you won't fix. Manually created list of errors that will not be reported.
This project is licensed under the MIT License - see the LICENSE file for details.
HTML report uses following libraries included in lib/pagewalker/report_template/lib/
:
- Semantic UI licensed under the MIT License, Copyright (c) 2015 Semantic Org
- jQuery licensed under the MIT License, Copyright JS Foundation and other contributors
- DataTables licensed under the MIT License, Copyright (C) 2008-2018, SpryMedia Ltd.
Nu Html Checker (v.Nu) is licensed under the MIT License, Copyright (c) 2007-2016 Mozilla Foundation