Index page gets always scanned, even if its not in the provided urls

Question

Index page gets always scanned, even if its not in the provided urls

BennyAlex opened this issue 9 months ago · comments

Describe the bug

When given an list of urls, eg:

   [ 'https://www.guetersloh.de/de/index.php',
     'https://www.guetersloh.de/de/datenschutz.php',
     'https://www.guetersloh.de/de/index.php#anchorContent',
     'https://www.guetersloh.de/de/leben-in-guetersloh.php',
     'https://www.guetersloh.de/de/leben-in-guetersloh/ehrenamt.php' ],

It stills scanns the "/" page and I get an additional sixth report

  "requestedUrl": "https://www.guetersloh.de/",
  "finalUrl": "https://www.guetersloh.de/",

D Creating Unlighthouse Unlighthouse 19:27:03
D Setting Unlighthouse Site URL [Site: https://www.guetersloh.de] Unlighthouse 19:27:04
starting unlighthouse
D Starting Unlighthouse [Server: undefined Site: https://www.guetersloh.de Debug: true] Unlighthouse 19:27:04
i The url config has been provided with 5 paths for scanning. Disabling sitemap, sampling and crawler. Unlighthouse 19:27:04
D Route has been queued. Path: / Name: _index. Unlighthouse 19:27:04
D Route has been queued. Path: /de/datenschutz.php Name: de-slug. Unlighthouse 19:27:04
D Route has been queued. Path: /de/index.php Name: de-slug. Unlighthouse 19:27:04
D Route has been queued. Path: /de/leben-in-guetersloh.php Name: de-slug. Unlighthouse 19:27:04
D Route has been queued. Path: /de/leben-in-guetersloh/ehrenamt.php Name: de-leben-in-guetersloh-slug.

Reproduction

No response

System / Nuxt Info

No response

Pedro Dornelas · Answer 1 · Wed Sep 20 2023 01:35:32 GMT+0800 (China Standard Time)

Try using:

 scanner: {
    // exclude specific routes
    exclude: [
        '^https:\/\/www\.guetersloh\.de\/$'
    ]
}