facebook / docusaurus

Easy to maintain open source documentation websites.

Home Page:https://docusaurus.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add trailing slash to auto generated sitemap.xml for directories only

John-fg opened this issue · comments

commented

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

Bing.com shows a redirection message with HTTP 301 for every page because each link in sitemap.xml is missing a trailing slash. The redirection message shown by the Bing.com search site is WEBMoved Permanently. The document has moved here. It does not display the actual content.

I'd like to:

  • prevent 301 redirects and use direct links in the generated sitemap.xml
  • have docusaurus generate trailing slashes in the generated sitemap.xml for directories.

Also see #4134

Reproducible demo

No response

Steps to reproduce

adding trailingSlash: true, to docusaurus.conf.js.

Expected behavior

Trailing slashes should only be used for actual directories.

Actual behavior

When 'trailingSlash' is added to docusaurus.conf.js within const conf = {..} is being refused with error messages:

const config = {
  title: 'Mysite',
  tagline: 'tagline',
  favicon: 'img/favicon.ico',

  // Set the production url of your site here
  url: 'https://mysite.tld',
  // Set the /<baseUrl>/ pathname under which your site is served
  // For GitHub pages deployment, it is often '/<projectName>/'
  baseUrl: '/',
  trailingSlash: true,

This creates a sitemap.xml with trailing slashes, building then fails due to broken links to anchors:

Error: Unable to build website for locale xx.
    at tryToBuildLocale (/home/user/mytopic/node_modules/@docusaurus/core/lib/commands/build.js:55:19)
    at async mapAsyncSequential (/home/user/mytopic/node_modules/@docusaurus/utils/lib/jsUtils.js:44:24)
    at async Command.build (/home/user/mytopic/node_modules/@docusaurus/core/lib/commands/build.js:82:21) {
  [cause]: Error: Docusaurus found broken links!

  Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
  Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.

  Exhaustive list of all broken links found:
  - Broken link on source page path = /docs/sub1/:
     -> linking to ./mydoc/#table-1 (resolved as: /docs/sub1/mydoc/#table-1)
     -> linking to mydoc2/#table-2 (resolved as: /docs/sub1/mydoc2/#table-2)
 (removed a list of more broken links)

It looks like links to anchors are not created properly. The directories here are mydoc and mydoc2, the anchors referenced on the index pages are #table-1 and #table-2.

The link in the md file looks like this:

[table 1](./mydoc#table-1) 
[table 2](mydoc2#table-2)

Your environment

  • Public source code: Docusaurus
  • Public site URL: n/a
  • Docusaurus version used: 3.2.1
  • Environment name and version (e.g. Chrome 89, Node.js 16.4): node v18.19.0
  • Operating system and version (e.g. Ubuntu 20.04.2 LTS): Debian 12 (bookworm).

Self-service

  • I'd be willing to fix this bug myself.

Trailing slashes should only be used for actual directories.

No that's not how this feature is designed sorry. There's not even a concept of "directory" in Docusaurus, only "docs categories".


FYI we recently fixed a bug related to trailing slash not being applied to sitemap:
#9920


301 redirect is a server/host concern, not a Docusaurus concern. If your host serves 301 instead of 200, then you have to configure your host so that it serves 200 instead of 301.


Those links are standard HTML relative links. If you want your pages to end with / then your links must contain that trailing slash too, that's how HTML links work.

[table 1](./mydoc#table-1) 
[table 2](mydoc2#table-2)

We have a whole doc section explaining why we don't recommend those kind of link, in particular due to the trailingSlash portability.

https://docusaurus.io/docs/markdown-features/links

CleanShot 2024-04-15 at 18 33 45


I'm closing because no concrete repro was provided, this issue is quite messy, and to me this works as intended unless proven otherwise.

If you want to discuss things further please create a runnable https://docusaurus.new/stackblitz repro

commented

So how would you configure the most used web server, Apache 2, not to use 301 redirects? Apache adds slashes by default: DirectorySlash On.

The root cause seems to be that sitemap.xml does not contain slashes while Apache requires slashes. A workaround would be to create sitemap.xml with trailing slashes as an option.

Stackblitz obviously does not replicate a real world setup. Do you mean you can't replicate that sitemap.xml does not generate trailing slashes?

So how would you configure the most used web server, Apache 2, not to use 301 redirects?

This is not an option we recommend using. I'd suggest using Vercel or Netlify, and if you cannot GitHub pages.

If you want to use Apache2, then it's your responsibility to figure out to configure it to serve a static deployment appropriately. I don't use Apache and I can't advise you how to configure it, although I'm pretty sure I already saw people using it successfully.

Docusaurus is only responsible for building a static deployment, not hosting it.

If you think our sitemap has a bug, then provide a repro and show what's the actual sitemap and what's the expected sitemap, given a fixed set of options. The expected behavior is that the sitemap contains URLs with/without / depending on the trailingSlash config, and the sitemap is expected to target the exact canonical URL of pages so if pages have / in their canonical URL, the sitemap should also contain a trailing slash.