wagtail / wagtail

A Django content management system focused on flexibility and user experience

Home Page:https://wagtail.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wildcard redirects

RealOrangeOne opened this issue · comments

Is your proposal related to a problem?

Currently, it's not possible to specify redirects with wildcards in them, which allow matching anything under a path.

Describe the solution you'd like

It should be possibly to specify a wildcard(s) as part of the redirect source URL, and then redirect to a single page. For example, redirecting /foo/* to /bar.

In an ideal world, it'd be nice to use capture groups to extract parts of the URL and extract them elsewhere, but that might be better as a second phase.

Describe alternatives you've considered

I implemented part of a solution in #7528, which worked fairly well in theory, but required using % as the wildcard placeholder, and leant hard on MATCHES in SQL, which may not be great for usability.

Additional context

Here are some examples of how Cloudflare handles redirects: https://developers.cloudflare.com/rules/url-forwarding/single-redirects/examples/

Working on this

Anyone can contribute to this. View our contributing guidelines, add a comment to the issue once you’re ready to start.

Redirects are checked as part of any page's 404 response, so it's important this process be performant.

@RealOrangeOne I would like to give this a try. I'll really appreciate if you could provide guidance on how I can approach implementing this feature.

I don't have an implementation in mind, you might need to do some research into potential ways to achieve this.

Thanks @RealOrangeOne, I believe we can update the _get_redirect method in RedirectMiddleware to handle wildcard redirects.

def _get_redirect(request, path):

I think this way we can implement the feature with minimal changes. Here's an example of how we could do it:

wildcard_redirects = models.Redirect.get_for_site(site).filter(old_path__endswith='*')
for redirect in wildcard_redirects:
    wildcard_path = redirect.old_path.rstrip('*')
    if path.startswith(wildcard_path):
        return redirect

I tried this locally and it is working good. I would love to know your thoughts on this.

Prefix matching is definitely better than nothing, but proper glob redirects would be nice. You might be able to achieve that by finding the first * in a string, doing a DB query for every redirect which starts with that, then running glob.glob on those matched paths? Not the most efficient, but it's still probably fairly fast, without needing to add a lot of extra complexity into the DB (using a regex probably isn't what we want)

Thanks @RealOrangeOne, I followed your suggestions and created a draft #11628. Can you please share your thoughts on it?

I've taken a look and added some comments. On second thoughts, it might not be the best approach. Looking for any redirect with a * in is going to be quite an expensive operation. I think the only way to do this is in the DB, which does increase the complexity a fair bit.

Okay, if performing the wildcard matching directly in the database seems like the best approach, then i can explore that option further.

Should i take #7528 as reference?

That's a good reference, yes. Although I think relying on DB wildcard literals may not be best. Perhaps pre-processing the search paths to use a * rather than %, but exactly how to do that I don't know. I'm also not sure if there are any security implications around using user-controlled strings in MATCHES statements (there probably isn't, but I don't know).

Regexes might also work, but that'd require users entering regexes for redirects, which has its own problems, both in terms of usability and security.

Oh okay, Could preprocessing the path potentially introduce unforeseen problems, especially if deliberate use of '*' could lead to unexpected behavior in the database? Also, I'm not aware of the potential security risks associated with this approach.

Therefore, should we consider sticking with the current method(following your suggestions on it) until we can find a proper way to handle this through the database?

Wildcard redirects turn out to be both quite difficult to implement, and likely not super useful to many to warrant the added overhead.

Looking at https://pypi.org/project/django-redirects/, another redirects implementation for Django, it supports exact, prefix, or full regex. Supporting regexes is probably more complex than we want, and might be better suited for elsewhere, but prefix ought not to be too complex, and give most of the benefits. Adding a "type" field, so a redirect can be either exact or a prefix, and a simple __startswith query won't impact performance too much.

With that said, this is all 1 person's fairly quick thinking about how this ought to work. This probably wants some proper thinking about the right way to implement this, what we're trying to solve, and if it's even something we want.

@RealOrangeOne I completely agree with what you said. So, following your suggestion, I am holding off on this until we have proper input and thoughts from other members as well!

Just as a datapoint, since pattern-based redirects weren't implemented, I just had to roll my own, making things more brittle and less standardized. There are definitely tradeoffs to being too conservative when adding bells and whistles to a platform.

Here's my temp solution:

class URL404FallbackMiddleware:

    def __init__(self, get_response):
        self.get_response = get_response

    @cached_property
    def patterns(self):
        root = get_resolver()
        fallback_patterns = []
        try:
            fallback_patterns.extend(importlib.import_module(root.urlconf_name).fallback_patterns)
        except (ModuleNotFoundError, AttributeError):
            pass
        return URLResolver(RegexPattern(r"^/"), fallback_patterns)

    def process_exception(self, request, exception):
        if not isinstance(exception, Http404) or not request.path.endswith("/"):
            return
        try:
            resolved: ResolverMatch = self.patterns.resolve(request.path)
        except Resolver404:
            return
        return resolved.func(request, *resolved.args, **resolved.kwargs)

    def __call__(self, request: WSGIRequest) -> HttpResponse:
        return self.get_response(request)

Add this middleware in settings, and add a fallback_patterns module attribute to main urls.py.

To redirect blog/anything (after wagtail fails to resolve the page) back to /blog, you can set something like this:

fallback_patterns = [re_path("^blog/.", RedirectView.as_view(url="/blog/"))]