Opened 4 years ago

Closed 4 years ago

#31070 closed New feature (needsinfo)

Add a check for URLconfs that mix named and unnamed capture groups

Reported by: Baptiste Mispelon Owned by: Baptiste Mispelon
Component: Core (System checks) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When using re_path(), Django supports the following types of capture groups:

1) Named: (?P<year>\d+)/(?P<month>\d+)/
2) Unnamed: (\d+)/(?\d+)/
3) Mixed: (?P<year>\d+)/(\d+)/

If you have a view with the following signature: def view(request, *args, **kwargs) and a URL /2019/11/, then the view will be called with the following arguments (respectively):

1) args=(), kwargs={'year': '2019', 'month': '11'}
2) args=('2019', '11'), kwargs={}
3) args=(), kwargs={'year': '2019'}

Case number 3 is a bit surprising but it's documented both in the reference docs [1]:

When a match is made, captured groups from the regular expression are passed to the view – as named arguments if the groups are named, and as positional arguments otherwise.

and in the topics docs [2]:

When both styles are mixed, any unnamed groups are ignored and only named groups are passed to the view function.

The topics doc actually discourages the use of unnamed capture group:

This usage isn’t particularly recommended as it makes it easier to accidentally introduce errors between the intended meaning of a match and the arguments of the view.

I propose introducing a new urls.W009 check for the URLConfs that would warn the user when they've configured a route that mixes both named and unnamed capture groups. As a remedy, it would suggest converting the unnamed groups to non-capturing groups ((?:\d+) syntax with our current example).

[1] https://docs.djangoproject.com/en/dev/ref/urls/#re-path
[2] https://docs.djangoproject.com/en/dev/topics/http/urls/#using-unnamed-regular-expression-groups

Change History (12)

comment:1 by Baptiste Mispelon, 4 years ago

Has patch: set
Owner: changed from nobody to Baptiste Mispelon
Status: newassigned

comment:2 by Baptiste Mispelon, 4 years ago

While working on this, I also realized that it would be fairly easy to check whether a view's signature matches the capture groups. This would help catch errors like in #31069.

Here's an example implementation using a warning (it could probably be upgraded to an error but I don't know if I might be missing weird edgecases):

diff --git a/django/urls/resolvers.py b/django/urls/resolvers.py
index f60d19d756..e33b687e5e 100644
--- a/django/urls/resolvers.py
+++ b/django/urls/resolvers.py
@@ -357,6 +357,7 @@ class URLPattern:
     def check(self):
         warnings = self._check_pattern_name()
         warnings.extend(self.pattern.check())
+        warnings.extend(self._check_view_signature())
         return warnings
 
     def _check_pattern_name(self):
@@ -373,6 +374,36 @@ class URLPattern:
         else:
             return []
 
+    def _check_view_signature(self):
+        """
+        Try to see if the callback's signature matches the regex capture groups
+        and the default_args
+        """
+        try:
+            signature = inspect.signature(self.callback)
+        except (ValueError, TypeError):
+            return []
+
+        if self.pattern.regex.groupindex:
+            # pattern has named capture groups
+            args, kwargs = (), {**self.default_args, **self.pattern.regex.groupindex}
+        else:
+            args, kwargs = (None,) * self.pattern.regex.groups, self.default_args
+
+        args = (None,) + args  # for the request object
+
+        try:
+            signature.bind(*args, **kwargs)
+        except TypeError:
+            return [Warning(
+                "Your URL pattern {} doesn't match the signature of the "
+                "corresponding view. Consider adding a `default_kwargs` to "
+                "the path.".format(self.pattern.describe()),
+                id="urls.W010",
+            )]
+        else:
+            return []
+
     def resolve(self, path):
         match = self.pattern.match(path)
         if match:

Should I open a separate ticket for this?

comment:3 by Carlton Gibson, 4 years ago

Patch needs improvement: set
Triage Stage: UnreviewedAccepted

Hi Baptiste. Thanks for this.

I think on the main issue here, a warning would be OK. So, yep, let´s accept.

As a remedy, it would suggest converting the unnamed groups to non-capturing groups ((?:\d+) syntax with our current example.

I don´t understand the reasoning for this suggestion. A non-capturing group would not be returned by the groupdict() call on the match, so would not be passed to the view.
So (I think) it´s not what users want/need/should be doing. Can you clarify?

Rather (I think) in the mixed case, we should just use "Prefer named capturing groups", assuming folks know what they´re doing if they´ve used just unnamed groups.

What do you think? (Perhaps I´m just missing something...)

On the second suggestion, yes it would be a separate ticket but, I don´t see the utility. All CBVs end up with a rather generic Signature(self, request, *args, **kwargs), so a check here isn´t going to catch much, (again) I think... ?

in reply to:  3 comment:4 by Baptiste Mispelon, 4 years ago

Replying to Carlton Gibson:

Hi Carlton,

My reasoning for suggesting a non-capturing group was to assume that the current pattern was working correctly as-is.
In the case of a mixed pattern (mixing named and unnamed capture groups) it means that all the unnamed groups can be rewritten as non-capturing ones and the pattern should keep working exactly the same (because of Django's documented behavior of dropping unnamed groups when a mixed pattern is used).
So the warning would encourage users to be more explicit about what they're capturing or not.

As for the second issue (checking view signatures), I hadn't thought about the case of CBVs.
I just checked and CBVs actually inherit the signature of their dispatch() method (because of the calls to update_wrapper which sets the __wrapped__ attribute used by inspect.signature(...) [1]).
Still, I'm not sure if a lot of people actually bother refining the signatures of their dispatch() methods to match their use-case.

I also assume that the majority of signature-mismatch errors will be caught directly when trying to use the view. I guess the only edge-case that my proposal would catch would be an optional capture group that doesn't match the view's signature. That doesn't seem like a huge win, I agree.

[1] https://github.com/django/django/blob/d8e233352877c37c469687287e7761e05bdae94e/django/views/generic/base.py#L78

comment:5 by Carlton Gibson, 4 years ago

Hi Baptiste.

Thanks for taking to the time to clarify/expand.

...assume that the current pattern was working correctly as-is.

Hmm. In that case I'm not sure of the value here.

From #31061, I had it in my mind that there was an actual error in play.
(But if an optional capture group—(?P<...>...)? fails to match, I guess the positional args might still come into play...)

That not being the case I'm not sure we should start raising a warning for code that has no-doubt been working for years.
(We get a lot of complaints when we do that, rightly I think)

I'd lean to wontfix in that case. What do you think?

in reply to:  5 comment:6 by Baptiste Mispelon, 4 years ago

Replying to Carlton Gibson:

From #31061, I had it in my mind that there was an actual error in play.
(But if an optional capture group—(?P<...>...)? fails to match, I guess the positional args might still come into play...)

I don't believe that's the case actually.
If there are named capture groups Django will always discard any unnamed groups, even if the named ones didn't match anything (this behavior had been broken and was restored as described in #31061).

Confusing, right?
That's why I thought having a warning to suggest you don't mix named and unnamed capture groups would be worthwhile.

The scenario I had in mind is this one:
1) Developer adds a new mixed re_path to their URLconf, thinking that Django will pass all groups (named and unnamed) to the view
2) The new warning kicks in, alerting them about the potential issue
3) If it was a bug, they fix it by changing the unnamed group to a named one. If it was by design, they change the unnamed group to a non-capturing one.

For someone who's been using a mixed pattern for a while then it would start giving them a warning but the fix is easy enough and I would argue it makes the regexp more correct (explicit vs implicit and all that).
With the recent addition of simplified patterns (path vs re_path) and the moving of the urls package (django.core.urls -> django.urls), I feel that the urls.py in people's projects have probably seen some changes with recent versions.

But all-in-all, I don't feel too strongly about this. I came to this via #31061 which made me dive into URL dispatcher internals where I discovered this quirk. I thought this check could help prevent issues but if you think it's not worth it and want to mark this as wontfix, I wouldn't oppose it.

comment:7 by Carlton Gibson, 4 years ago

Right, OK... So I come back to my initial thought:

Rather (I think) in the mixed case, we should just use "Prefer named capturing groups", assuming folks know what they´re doing if they´ve used just unnamed groups.

i.e. the bit I'm (still) not quite following you to is:

As a remedy, it would suggest converting the unnamed groups to non-capturing groups ((?:\d+) syntax with our current example).

They'd still be mixed patterns and fail getting passed to the view.

This:

... assume that the current pattern was working correctly as-is.

I think that's probably not right no? I have a mixed pattern, of which only the named groups will ever be passed to the view, so I don't see how the pattern can be thought of as working correctly. ??? (Sorry if I'm being really slow here...)

So tl;dr: I think we should have a warning for mixed patterns, but say to use all named groups, rather than a mix.

Version 0, edited 4 years ago by Carlton Gibson (next)

comment:8 by Baptiste Mispelon, 4 years ago

I agree that named groups should be preferred over unnamed groups (the documentation currently suggests that too). My problem is that it doesn't seem like a backwards-compatible suggestion.

Consider the case of someone who has a working mixed-pattern URLConf, something like:

^entry/(?P<pk>\d+)(\.html|/)$

In that case, changing the unnamed capture group to a named one will most likely break because the view will start receiving a new keyword argument which is probably incompatible (the view could be defined as def blog_entry(request, pk) for example).

In contrast, changing the unnamed capture groups to non-capturing ones is 100% backwards compatible and everything will keep behaving the same: the view will keep receiving the same keyword arguments as it did before.

Here's a table that summarizes Django's behavior when matching the URL /entry/123.html:

Pattern view call
^entry/(?P<pk>\d+)(\.html|/)$ blog_entry(pk='123')
^entry/(?P<pk>\d+)(?P<ext>\.html|/)$ blog_entry(pk='123', ext='.html')
^entry/(?P<pk>\d+)(?:\.html|/)$ blog_entry(pk='123')

TLDR: What if the warning message was something like this (hopefully we can make it a bit less verbose while keeping it clear):

Warning: you're mixing both named capture groups (syntax (?P<name>...) and unnamed capture groups (syntax (...)) in your URL pattern XXX.
When you mix those, Django will only pass the named capture groups to the view while the unnamed ones will be discarded.
Consider naming all your capture groups or alternatively, if you're relying on the silently-discarding-unnamed-captured-groups behavior then consider using unnamed capture groups for them (syntax (?:...)).

comment:9 by Carlton Gibson, 4 years ago

OK, nearly with you. Thank you for your patience.

Consider the case of someone who has a working mixed-pattern URLConf, something like:
^entry/(?P<pk>\d+)(\.html|/)$

In this case, when is the (\.html/) bit ever used? Just as an optional format maybe (\.html)? — so I can add it or not but it never affects the view in any real way? (I guess the question is "Why would I put it in a group at all if it were not for something?")

If this kind of usage makes sense (which I guess in the "optional format" example it maybe does — but even then... surely the view wants that info!!!) then I can't see that we should raise a warning, again, for patterns that have been working forever™.

The possible error it would be worth catching is "You do know your non-named groups won't ever be used".

My concern here is a lot of warnings for perfectly valid patterns. (Would folks use groups not meaning for them to be passed to the view?)

comment:10 by Carlton Gibson, 4 years ago

Aside: You may think, "Why is Carlton being such a pain about this? It's minor." If so, the answer is that we've get stung by over zealous system checks multiple times. We put them in, get complaints and have to pull them out. I'd just rather measure twice, cut once.

in reply to:  9 comment:11 by Baptiste Mispelon, 4 years ago

Replying to Carlton Gibson:

In this case, when is the (\.html/) bit ever used? [...]

In my example I used ...(\.html|/)$ which has a sneaky | (it's easy to miss, I should have pointed it out). That way, you could access the same view with either /entry/123/ or /entry/123.html.

I think it's pretty common to use a group when using the (A|B) syntax, especially when A and B are more complex than single-character regexes.
It's also common to use a group for optional things (as you pointed out). For example I could have rewritten the URLConf like so: ^entry/(?<pk>\d+)(\.html)?.

(I guess the question is "Why would I put it in a group at all if it were not for something?")

My best answer would be "because parentheses create capturing groups by default and the non-capturing group syntax is somewhat obscure therefore nobody bothers making non-capturing groups".

If this kind of usage makes sense (which I guess in the "optional format" example it maybe does — but even then... surely the view wants that info!!!) then I can't see that we should raise a warning, again, for patterns that have been working forever™.

The possible error it would be worth catching is "You do know your non-named groups won't ever be used".

My concern here is a lot of warnings for perfectly valid patterns. (Would folks use groups not meaning for them to be passed to the view?)

The potential for spurious warnings is there, I agree. There's a balance to be found between being annoying with unnecessary warnings and preventing real issues for our users.
You're making good arguments that this check might err too much on the "annoying" side but I don't have a technical solution to correct that.

Thinking about this more, don't really know why Django silently discards unnamed groups in the case of a mixed pattern (if I had to guess, I'd say it's a historical leftover of a limitation that doesn't exist anymore).
Maybe this warning could be the first step of a deprecation path that would end in Django passing all captured groups to the view both named and unnamed? Or is that too ambitious?

comment:12 by Carlton Gibson, 4 years ago

Resolution: needsinfo
Status: assignedclosed

HI Baptiste. Right, thanks for working with me.

My suspicion here is that things are fine as they are. Let's leave off adding the warning, at least for now.
(I hope that makes sense, I think it does given the conversation...)

...first step of a deprecation path that would end...

If you think there's a real gain to be had, I'm happy if you want to flesh that out a little bit and take it to the mailing list to see if folks think it worth the churn.
(That could well be...) If so we can re-open and push on with that.

Note: See TracTickets for help on using tickets.
Back to Top