Wraith spidering mode checks entire website after paths have been mentioned in spider.yaml
khana25 opened this issue · comments
Ayub Khan commented
I want spider to crawl and checks for all those paths that comes after '/new-homes' not the entire website paths
At the moment, it checks for the entire website rather checking only the paths after '/news'
I have given the example of spider.yaml file below
Reporting a problem? Please describe the issue above, and complete the following checklist so that we can help you more quickly.
Issue checklist:
-
I have validated my config file against YAML Validator to make sure it is valid YAML.
-
I have run the
wraith info
command and pasted the output below:
/new-homes
/
/new-homes/developments-by-county
/my-home
/my-home/sign-in
/new-homes/forthcoming-developments
/new-apartments
/luxury-new-homes-apartments
/the-buying-process/our-team
/new-homes/completed-developments
/the-buying-process/meeting-your-expectations
/the-buying-process/step-by-step-process
/the-buying-process/mortgage-payment-calculator
/purchasing-schemes/the-schemes
/purchasing-schemes/help-to-buy
/the-berkeley-difference/the-difference
/the-berkeley-difference/berkeley-overview
/the-berkeley-difference/berkeley-approach
/the-berkeley-difference/world-class-places
/about-berkeley-group
/investor-information
/sustainability
/media-centre
/the-buying-process
/the-buying-process/our-team
/purchasing-schemes
/purchasing-schemes/the-schemes
/property-developers/berkeley
/the-berkeley-difference
/the-berkeley-difference/the-difference
/property-developers/st-george
/property-developers/st-edward
/property-developers/st-james
/property-developers/st-joseph
/property-developers/st-william
/the-queens-award
/about-berkeley-group/our-vision
/about-berkeley/careers
/accessibility
/sitemap
/legal
/privacy-policy
/about-berkeley-group/contact-us
/modern-slavery-statement
/cookie-policy
/new-homes/buckinghamshire/taplow/taplow-riverside
/new-homes/london/tower-bridge/one-tower-bridge
/new-homes/west-sussex/horsham/highwood
/new-homes/london
/new-homes/london/twickenham/brewery-gate
/press-releases/2018/local-schoolchildren-enjoy-new-facilities
/media-centre/press-releases
/press-releases/2017/berkeley-named-one-of-britains-most-admired-companies
/media-centre/press-releases
/press-releases/2018/new-development-launches-in-blackheath
/media-centre/press-releases
/media-centre/press-releases
/new-homes/berkshire
/new-homes/buckinghamshire
- I have run the command in verbose mode (by adding
verbose: true
to my config) and pasted the output below:
paste results here
- [Y] I have pasted the contents of my config file below:
##############################################################
##############################################################
# This is an example configuration provided by Wraith.
# Feel free to amend for your own requirements.
# ---
# This particular config is intended to demonstrate how
# to use Wraith in 'spider' mode.
##############################################################
##############################################################
# Add as many domains as necessary. Key will act as a label
domains:
my_site: 'https://www.berkeleygroup.co.uk/new-homes'
# Notice the absence of a `paths` property. When no paths are provided, Wraith defaults to
# spidering mode to check your entire website.
paths:
new-homes: /
# A list of URLs to skip when spidering.
# Ruby regular expressions can be used, if prefixed with `!ruby/regexp` as defined in the YAML Cookbook
# See http://www.yaml.org/YAML_for_ruby.html#regexps
imports: "spider_paths.yml"
# the filename of the spider file to use. Default: spider.txt
spider_file: example_com_spider.txt
# the number of days to keep the site spider file
spider_days: 10
# amount of fuzz ImageMagick will use when comparing images. A higher fuzz makes the comparison less strict.
fuzz: '20%'
# the maximum acceptable level of difference (in %) between two images.
# Wraith considers it a failure if an image diff goes above this threshold.
threshold: 5
# screen widths (and optional height) to resize the browser to before taking the screenshot
screen_widths:
- 320x568 #iPhone 5
- 375x667 # iPhone 6/7/8
- 414x736 # iPhone 6/7/8plus
- 375x812 # iPhoneX
- 768x1024 # iPad
- 834x1112 # iPad 10.5
- 1024x1366 # iPad 12.5
- 2560x1440 # iMac
- 1440x900 # Desktop
- 1366x768 # Desktop
- 1920x1080 # Desktop
# the engine to run Wraith with.
browser: "phantomjs"
# the directory that your latest screenshots will be stored in
directory: 'shots'
# choose how results are displayed in the gallery (default is `alphanumeric` if omitted)
# Different screen widths are always grouped together.
# Options:
# alphanumeric - all paths (with or without a difference) are shown, sorted by path
# diffs_first - all paths (with or without a difference) are shown, sorted by difference size (largest first)
# diffs_only - only paths with a difference are shown, sorted by difference size (largest first)
mode: diffs_only