#730 closed enhancement (fixed)
more explicit middlware ordering
Reported by: | hugo | Owned by: | Andrew Badr |
---|---|---|---|
Component: | Core (Other) | Version: | |
Severity: | normal | Keywords: | |
Cc: | oliver@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
In a django-developer discussion the problem of middlware ordering came up. I think a solution might be to have an optional way to give an explicit middleware ordering for the response phase.
This could be done in a way so that if the user doesn't give an explicit ordering, the system automatically uses the reversed middleware list - that's the behaviour we have now.
But if the users defines a setting for middlewar response order, the system could check that the content is identical in content, but can be different in order. That way people can give explicit order and response handling would be much more explicit and understandable.
Attachments (8)
Change History (48)
comment:1 by , 19 years ago
comment:2 by , 19 years ago
Taking up the idea of Sune in the django-developer list (about splitting middleware whose process_request and process_response order conflict), would it maybe really be simpler to just allways use the same order for middleware, instead of reversing it in process_response? At least it would make it obvious again in what order they are handled. And if the order in process_response can't be used in the same order as in process_request, maybe the solution really would be to just split that middleware.
I think I know what I dislike about the current solution: many of the django-delivered middleware is order-sensitive in the process_response phase, while the process_request phase just is some setup step (or completely missing). But we still order them with regard to the handling of process_request ...
comment:3 by , 19 years ago
I'm all for changing the middleware algorithm so that it doesn't reverse it in process_response. That was Jacob's original idea when he set up the system -- maybe he can shed some light on it?
comment:4 by , 19 years ago
Uhmm... The cache-middleware depends on this feature, so please don't change it without a very good reason.
comment:5 by , 19 years ago
sune: actually the CacheMiddleware doesn't rely on the specific reversed order - it just relies on _one_ defined order, so users know where to put it in the stack of middleware :-)
comment:6 by , 18 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:8 by , 17 years ago
Triage Stage: | Design decision needed → Someday/Maybe |
---|
At this point, changing the default response middleware ordering could break a huge amount of code. I wouldn't be opposed to something like a RESPONSE_MIDDLEWARE_ORDERING
setting (or whatever), but so far at least I haven't seen a huge need for it.
comment:9 by , 17 years ago
Alternatively, ond could list explicit dependencies directly in the middleware classes themselves, and have the ordering managed automatically.
by , 16 years ago
Attachment: | mw-depends.diff added |
---|
Adding example dependencies and testing dependencies are met during management's validate.
comment:10 by , 16 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
While it doesn't solve all the concerns brought up with this ticket, the mw-depends.diff
patch shows adding and testing of dependencies of middleware. I originally set out to build a middleware graph and test for cycles (bad) and dependencies but stumbled upon this solution which was much simpler. So for tonight, at least, that's where I'm stopping. Note: Not all the built-in middleware classes have the depends
attribute added -- I consider this not done.
As sample output, I've reversed the session and auth middleware to trigger an error on manage.py validate
and here is what it looks like:
django.contrib.auth.middleware.AuthenticationMiddleware: Dependency on "django.contrib.sessions.middleware.SessionMiddleware" not met. Check to make sure the dependent middleware is included in MIDDLEWARE_CLASSES and is listed above AuthenticationMiddleware.
follow-up: 12 comment:11 by , 16 years ago
Looks pretty nice to me. Two improvements I'd make:
- I'm not sure that
continue
'ing for invalid entries is correct; might be better to complain about the incorrect entry. Not sure here, though - is there a reason you choose to do it this way? - I'd take away the empty "depends" declarations in a few of the classes - looks cleaner to simply not provide it if it's not needed.
comment:12 by , 16 years ago
Replying to jacob:
I'm not sure that
continue
'ing for invalid entries is correct; might be better to complain about the incorrect entry. Not sure here, though - is there a reason you choose to do it this way?
Simply because I wasn't ready to think about the right thing to do. Most of the middleware loading code is a DRY violation since it's directly copied from django.core.handlers.base.BaseHandler's load_middleware method, except for this method sets up which type of middleware it is (e.g. request, response, etc.) Would it make sense to abstract this out a little so it's not repeated -- make a load_middleware_instance(mw_path) method that returns an instance of the middleware given the path? Then this method can raise whatever exceptions it finds and the validation can put it in a try/except block. That's a bit of refactoring I wasn't ready to commit to last night and would like some encouragement before I go down that path. If I were to pull it out so both bits of code can re-use it, where would be a proper place for it to be?
I'd take away the empty "depends" declarations in a few of the classes - looks cleaner to simply not provide it if it's not needed.
I can remove it, but I like it because then the developer will know that this middleware has been "updated" to specify what the middleware depends on and it has no dependencies.
Thanks for the feedback.
comment:13 by , 16 years ago
Triage Stage: | Someday/Maybe → Accepted |
---|
follow-up: 16 comment:14 by , 16 years ago
I think the approach taken by Gentoo's init script would fit well here. Basically, there are three keywords inits scripts can use: depends, uses and after (I think all of them are self explanatory). This allows, for example, to mark a middleware class which need to be the last one as " after = '*' ". Otherwise, if you made it dependant on every Django middleware class, you're forcing everyone using custom middleware to patch Django (or allowing them to misconfigure the middleware stack).
comment:15 by , 16 years ago
Well, I forgot about "before", which is a supported keyword and would also make sense here.
comment:16 by , 16 years ago
Replying to Alberto García Hierro <fiam@rm-fr.net>:
I think the approach taken by Gentoo's init script would fit well here. Basically, there are three keywords inits scripts can use: depends, uses and after (I think all of them are self explanatory). This allows, for example, to mark a middleware class which need to be the last one as " after = '*' ". Otherwise, if you made it dependant on every Django middleware class, you're forcing everyone using custom middleware to patch Django (or allowing them to misconfigure the middleware stack).
I agree. I started with the depends stuff and when I started thinking about the caching middleware it started breaking down, especially with middleware that alters the Vary header.
comment:17 by , 16 years ago
Owner: | changed from | to
---|
I'm working on a patch to allow distinct request and response orderings.
by , 16 years ago
Attachment: | explicit_middleware_ordering_t730_r8166.diff added |
---|
comment:18 by , 16 years ago
Has patch: | set |
---|---|
Needs tests: | set |
This ticket shows the approach I decided to take, which is to have an optional "OUTGOING_MIDDLEWARE_ORDER" setting or default to current behavior (reverse). The patch seems to work correctly, but doesn't have tests or documentation yet.
I started implementing something like Gentoo's init scripts described above, but decided it would be too complicated to maintain. At the minimum, every middleware would need two new fields -- comes before and comes after -- and you could only do those if you knew the path to the other middlewares you were talking about. I think it would be fine for the built-in ones, but when you consider people using multiple third-party middlewares that don't know about each other, you start need extra middleware dependency information in your settings. Furthermore, it makes it hard to understand what your code is actually doing -- not everyone wants to do a topological sort.
That's why I chose this more straightforward approach, even though it is somewhat error-prone to manually reverse MIDDLEWARE_CLASSES.
#5691 is blocked on this.
comment:19 by , 16 years ago
In discussion with Malcolm, we discussed some concern about the fact that OUTGOING_MIDDLEWARE_ORDER
setting will always be necessary.
Users don't have the option to ignore OUTGOING_MIDDLEWARE_ORDER
if they aren't impacted by this problem, since it will have to exist in global_settings and will have to be overridden.
This also means that we are guaranteed to break any existing code where people have explicitly specified MIDDLEWARE_CLASSES
... and it will happen in a non-obvious way (it won't explode, it'll just go wonky).
We're still discussing this here and it looks like every solution comes with its own particularly hairy problems.
comment:20 by , 16 years ago
Could we use a guard value in global_settings? That is, ignore OUTGOING_MIDDLEWARE_ORDER if it's None or [] or whatever the default is?
comment:21 by , 16 years ago
Here is a "different but similar" approach to that taken by andrewbadr.
I've allowed MIDDLEWARE_CLASSES
to be either a tuple
/list
or dict
.
If it is a tuple
/list
, then it is treated as it's presently -- down for request/back up for response (and exceptions).
If it is a dict
, then it looks for keys labeled request
, response
, exception
, and view
and uses them in their natural order.
Downsides
- May cause DRY violation, though many middleware only care about 1 of the 4 types.
- Certainly doesn't lead to an 'attractive' settings file.
- Exposes the complexity of the problems discussed here to the developer. (Requires the developer to solve ordering problems with middleware)
Upsides
- Solves the underlying problem of different orderings being needed for different pieces of the request lifecycle
- Maintains explicit ordering.
- Users still has complete control over ordering (as opposed to depends/before/etc)
- Backwards compatible (people using the existing
MIDDLEWARE_CLASSES
specification will still be able to) - Only one setting -- rather than coupling
MIDDLEWARE_CLASSES
andOUTGOING_MIDDLEWARE_ORDER
- Addresses all middleware paths (rather than just response)
- Default is still pretty painless
- Should be intuitive as long as the developer looks at the example
Regardless, we still need tests and docs.
by , 16 years ago
Attachment: | middleware_dict__ticket_730__rev_8168.patch added |
---|
comment:22 by , 16 years ago
A couple issues:
1) (This applies to both of our patches.) Each middleware is instantiated multiple times, so it's impossible to share data from the inbound to the outbound processing as some middlewares currently do.
2) If I want to switch from tuple/list to a dict, I have to look at every middleware I'm using (Session, Authentication...) and figure out which methods it implements so that I know which dict keys to add it to.
by , 16 years ago
Attachment: | explicit_middleware_ordering_t730_r8166_v2.diff added |
---|
comment:23 by , 16 years ago
Just uploaded a revised version of the approach used by my first patch. Fixes issues (1) above and another bug when no outgoing order is specified.
Something I didn't mention is that this patch requires OUTBOUND_MIDDLEWARE_ORDER to have the exact same contents as MIDDLEWARE_CLASSES.
comment:24 by , 16 years ago
Another way to solve this would be to use (middleware, outbound_priority)
tuples in MIDDLEWARE_CLASSES
.
Inbound ordering is specified by the order of elements in MIDDLEWARE_CLASSES
as previously. Outbound ordering would be explicitly specified by the outbound_priority
value.
E.g. the following tuple
MIDDLEWARE_CLASSES = ( ('django.middleware.common.CommonMiddleware', 2), ('django.contrib.sessions.middleware.SessionMiddleware', 3), ('django.contrib.auth.middleware.AuthenticationMiddleware', 1), ('django.middleware.doc.XViewMiddleware', 0), )
would be ordered as follows:
>>> def outbound_order(): ... outbound = [ (mw[1], mw[0]) for mw in MIDDLEWARE_CLASSES ] ... outbound.sort() ... return [ mw[1] for mw in outbound ] ... >>> outbound_order() ['django.middleware.doc.XViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware']
comment:25 by , 16 years ago
Although this looks uglier than OUTBOUND_MIDDLEWARE_ORDER, it overcomes the problems __hawkeye__
outlined in #730:comment:19 and avoids having to duplicate all middleware in OUTBOUND_MIDDLEWARE_ORDER
if the latter is required. Duplication violates DRY and is fragile -- if you add or remove something from MIDDLEWARE_CLASSES
, you have to remember to update OUTBOUND_MIDDLEWARE_ORDER
accordingly.
The multiple instantiation problem mentioned above can easily be avoided by creating a list from the tuple that contains (middleware_instance, outbound_priority)
tuples first and use these instances to fill inbound and outbound middleware lists.
Perhaps a more eye-pleasing (but more obscure?) way to specify ordering is to use "weight" instead of "priority", i.e. "heavier" items will be processed before lighter ones:
MIDDLEWARE_CLASSES = ( ('django.middleware.common.CommonMiddleware', 0), ('django.contrib.sessions.middleware.SessionMiddleware', 1), ('django.contrib.auth.middleware.AuthenticationMiddleware', 3), ('django.middleware.doc.XViewMiddleware', 2), )
and correspondind outbound processing order:
[('django.contrib.auth.middleware.AuthenticationMiddleware', 3), ('django.middleware.doc.XViewMiddleware', 2), ('django.contrib.sessions.middleware.SessionMiddleware', 1), ('django.middleware.common.CommonMiddleware', 0)]
Note that the outbound_order
function above is just an illustration. An actual implementation should check for outbound_priority
uniqueness.
by , 16 years ago
Attachment: | middleware_ordering_with_path-weight_tuples.diff added |
---|
Note that the patch can be made backwards-compatible by adding additional isinstance(middleware, tuple)
checks. Whether that is good remains open.
by , 16 years ago
Attachment: | middleware_ordering_with_path-weight_tuples-backwards-compatible.diff added |
---|
Added backwards-compatibility and uniqueness checks. All tests pass as of r8219. See notes below.
comment:26 by , 16 years ago
Notes on the last patch:
- for backwards-compatibility,
MIDDLEWARE_CLASSES
is processed exactly like before if they are not tuples - if they are tuples, they are interpreted as
(middleware_instance, weight)
as specified above (i.e. heavier items processed first when outbound) weight
values have to be unique to avoid typos-confusion-whatnot- only one instance is created of each middleware class
- tests pass
follow-up: 28 comment:27 by , 16 years ago
mrts, I like that your approach is more DRY, but it does make it harder to understand what is going on by reading the settings (you have to sort middleware in your head). Not sure which I prefer. Anyways, two comments: (1) I was opposed to enforcing weight-uniqueness at first, because that makes it hard to insert in the middle, but then I realized that floats would work fine. Might be worth mentioning. (2) You can't use "set" because it's from 2.4.
comment:28 by , 16 years ago
Replying to anonymous:
mrts, I like that your approach is more DRY, but it does make it harder to understand what is going on by reading the settings (you have to sort middleware in your head).
Indeed, first-glance readability suffers somewhat. I'd say the "heavier-first" at least looks a little bit better (though is actually counterintuitive).
A minor point to note: http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists/ . In this case, MIDDLEWARE_CLASSES is a homogeneous constant list, but the entries are hetereogeneous and should be thought of as C structs (path='foo', outbound_weight=x)
. This mental juggling and the common "another constant list" first impression for the latter might be what creates the mental barrier to readability.
Not sure which I prefer. Anyways, two comments: (1) I was opposed to enforcing weight-uniqueness at first, because that makes it hard to insert in the middle, but then I realized that floats would work fine. Might be worth mentioning.
Good point.
(2) You can't use "set" because it's from 2.4.
That is already taken care of with
try: set except NameError: from sets import Set as set
in the beginning of the patch.
Last not least: as far as I can understand, explicit ordering is not required for most users, so they can happily continue writing their settings as they did before. Advanced users should be more prepared for dealing with the sort-middleware-in-your-head problem.
comment:29 by , 16 years ago
Triage Stage: | Accepted → Design decision needed |
---|
comment:30 by , 16 years ago
s/explicit ordering is not required/explicit *outbound* ordering is not required/
in my previous comment.
comment:31 by , 16 years ago
Jacob has picked the dict approach. What needs fixing:
- single instances
- validation: middleware foo handles both requests and responses, but is only registered for one of them due to user error (this is the major downside of using dicts)
- tests
- documentation
I'm still somewhat concerned about the RY and fragility of the dicts (the validation point above), but I like the control and explicitness they provide as Jacob does. I'll deal with single instance and validation as soon as I can.
comment:32 by , 16 years ago
Triage Stage: | Design decision needed → Accepted |
---|
by , 16 years ago
Attachment: | middleware_dict_single_instances_basic_sanity_checks.diff added |
---|
Improved dict middleware setting. Single instances, basic hasattr('process_foo', mw)
checks.
comment:33 by , 16 years ago
An aside note,adding dependency handling would be easy as well:
class FooMiddleware(object): __init__(self, dependencies): if 'SessionMiddleware' not in dependencies: raise ImproperlyConfigured("FooMiddleware depends on SessionMiddleware")
and corresponding mw_class(mw_classnames)
on line where middleware instance is created from the class, where mw_classnames
contains mw_instance.__class__
es for all middleware previously instantiantiated. Let me know if this is needed.
by , 16 years ago
Attachment: | middleware_dict_single_instances_basic_sanity_checks-validation.diff added |
---|
Added validation as well. Consider this patch to be a draft of the general ideas, it's not even tested.
comment:34 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
After a long conversation with Malcolm, we realized that this has kinda spun out of control. Recall that the original problem is realy just about the cache middleware; hence [8260].
So, I'm going to call this fixed, and shy away from any of these bigger problems. Why? I really can't come up with any example of why you'd need to customize middleware ordering in this manner. On top of that, all of the proposed syntaxes are confusing and error-prone. Finally, if there is another case like the cache middleware, it's really quite easy to split the behavior into two middleware classes.
comment:35 by , 16 years ago
Jacob, I have a final proposal here. There could be functions inbound_only and outbound_only that, given a middleware path, return a middleware class with only the appropriate half of the functions. To solve something like #5691, a user could do the following:
from django.middleware import inbound_only, outbound_only MIDDLEWARE_CLASSES = ( outbound_only('django.middleware.cache.CacheMiddleware'), 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.locale.LocaleMiddleware', inbound_only('django.middleware.cache.CacheMiddleware'), )
This is simpler to write and maintain than the previous choices, while also not requiring you to split up any middleware. It may also be useful in cases beyond ordering, like if you only *want* half of a middleware for whatever reason.
comment:36 by , 16 years ago
Oops, sorry, I hadn't looked at [8260] which splits up the CacheMiddleware in trunk. I'm happy with this decision.
comment:37 by , 16 years ago
It's not the cleanest solution (require to split up a functional unity only because there is no proper control of processing order), but practicality beats purity. All hail the decision :)
comment:38 by , 16 years ago
Perhaps we should continue discussing this after 1.0 (where "discussing" means trying to come up with a nice and clean design)?
comment:39 by , 16 years ago
Cc: | added |
---|
From #733:
And it is a case for #730: the LocaleMiddleware needs in process_request access to the session, so must come _after_ SessionMiddleware. But SessionMiddleware and LocaleMiddleware must come _after_ CacheMiddleware, so that the caching sees the Vary header in process_response (due to the reversed order of middleare). That's exactly the kind of "order deadlock" the user can't solve without giving explicit middleware order.
Another way would be to not reverse the order in the process_response phase, but to allways run middleware in the same defined order. What middleware does actually require the reversed order in process_response? Does it really make sense (I believed it previously, but am not sure myself any more)? It _is_ irritating to the user, heck, it's even irritating to me ...