Pretty URLs: Problems with `.html` extension in file names
straight-shoota opened this issue · comments
In Jekyll, the output paths for pages have a .html
extension by default. For example: /community/governance.html
This is not very pretty and practically irrelevant.
Users typically expect extension-free URLs. An example for that are the links to https://crystal-lang.org/community/governance from crystal-book. This URL is currently a 404 because the extension is missing. It must be https://crystal-lang.org/community/governance.html
A trivial fix for the correct URL is in crystal-lang/crystal-book#643
But it's a general problem that people write pretty URLs. You shouldn't have to type .html
at the end of a URL.
I propose to change the default url format for pages to permalink: /:path/:basename/
. This generates pretty urls (it basically creates folders with a index.html
which the web server then serves for the folder path).
A superior solution to keep both variants (for not breaking existing links) would be to make the .html
extension optional directly in the web server. For nginx config this would be something like try_files $uri $uri.html $uri/ =404;
.
I presume there is nothing like this for S3 where we're hosting the website, right? @matiasgarciaisaia any idea?
We're hosting on AWS S3, which supports object-level redirections.
It'd be awesome if we found a Jekyll plugin that takes care of setting up the redirections, but, if not, we could manually list every URL we currently have (ie, the current sitemap), then enable pretty URLs, deploy and set up each redirection we need.
The general redirection rules match by prefix (not suffix), so we won't be able to use a single, general rule.
Yeah, I don't think custom rules in S3 for every single path are a practical solution.
So probably a good path forward is to setup Jekyll to generate pretty URLs by default. For existing pages with .html
paths, we can create custom redirects to the pretty paths inside Jekyll.
I would ignore blog posts for now.
That leaves us with only a few paths:
$ find _site -name '*.html' -not -name index.html -not -path '_site/20*'
_site/community/governance.html
_site/docs.html
_site/sponsors/original-sponsors.html
_site/learning/crystal_programming.html
_site/404.html
404.html
is only used internally, so we can ignore that.
Is there any reason to avoid the blogposts? I'd do it for every page, so we forget this existed 😇
Post paths are already clunky, even without .html
extension 🤷 This means it's less a problem because you don't just type a post URL somewhere, it's always copy & paste. That's different with easier paths for pages that also serve as landing pages.
And it's more complex to implement individual redirects for 100+ post paths.
Maybe a plugin could help with that, but I just think it's probably not worth the effort if we can't have a simple, general redirect rule configuration somewhere.
I think the same script we'll use to create an object-by-object redirect (using the output of your find
command up there) would be enough to also redirect blog posts.
I mean - it's probably just you and me manually typing URLs. People don't do that :P Let's aim for consistency, at least.
Not sure how exactly you want to do that. But I've created #352 which moves the pages to pretty URLs.
For blog posts, the change is trivial:
--- i/_config.yml
+++ w/_config.yml
@@ -53,7 +53,7 @@ defaults:
type: releases
values:
layout: post
- permalink: /:year/:month/:day/:title.html
+ permalink: /:year/:month/:day/:title
image: /assets/icon.png
twitter:
Then we just need redirects.
Resolved by #352