Code

Opened 9 years ago

Closed 6 years ago

Last modified 3 years ago

#730 closed enhancement (fixed)

more explicit middlware ordering

Reported by: hugo Owned by: andrewbadr
Component: Core (Other) Version:
Severity: normal Keywords:
Cc: oliver@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: UI/UX:

Description

In a django-developer discussion the problem of middlware ordering came up. I think a solution might be to have an optional way to give an explicit middleware ordering for the response phase.

This could be done in a way so that if the user doesn't give an explicit ordering, the system automatically uses the reversed middleware list - that's the behaviour we have now.

But if the users defines a setting for middlewar response order, the system could check that the content is identical in content, but can be different in order. That way people can give explicit order and response handling would be much more explicit and understandable.

Attachments (8)

mw-depends.diff (3.6 KB) - added by Rob Hudson <treborhudson@…> 6 years ago.
Adding example dependencies and testing dependencies are met during management's validate.
explicit_middleware_ordering_t730_r8166.diff (4.7 KB) - added by andrewbadr 6 years ago.
middleware_dict__ticket_730__rev_8168.patch (7.5 KB) - added by __hawkeye__ 6 years ago.
explicit_middleware_ordering_t730_r8166_v2.diff (4.6 KB) - added by andrewbadr 6 years ago.
middleware_ordering_with_path-weight_tuples.diff (2.2 KB) - added by mrts 6 years ago.
Note that the patch can be made backwards-compatible by adding additional isinstance(middleware, tuple) checks. Whether that is good remains open.
middleware_ordering_with_path-weight_tuples-backwards-compatible.diff (4.6 KB) - added by mrts 6 years ago.
Added backwards-compatibility and uniqueness checks. All tests pass as of r8219. See notes below.
middleware_dict_single_instances_basic_sanity_checks.diff (8.8 KB) - added by mrts 6 years ago.
Improved dict middleware setting. Single instances, basic hasattr('process_foo', mw) checks.
middleware_dict_single_instances_basic_sanity_checks-validation.diff (10.4 KB) - added by mrts 6 years ago.
Added validation as well. Consider this patch to be a draft of the general ideas, it's not even tested.

Download all attachments as: .zip

Change History (48)

comment:1 Changed 9 years ago by hugo

From #733:

And it is a case for #730: the LocaleMiddleware needs in process_request access to the session, so must come _after_ SessionMiddleware. But SessionMiddleware and LocaleMiddleware must come _after_ CacheMiddleware, so that the caching sees the Vary header in process_response (due to the reversed order of middleare). That's exactly the kind of "order deadlock" the user can't solve without giving explicit middleware order.

Another way would be to not reverse the order in the process_response phase, but to allways run middleware in the same defined order. What middleware does actually require the reversed order in process_response? Does it really make sense (I believed it previously, but am not sure myself any more)? It _is_ irritating to the user, heck, it's even irritating to me ...

comment:2 Changed 9 years ago by hugo

Taking up the idea of Sune in the django-developer list (about splitting middleware whose process_request and process_response order conflict), would it maybe really be simpler to just allways use the same order for middleware, instead of reversing it in process_response? At least it would make it obvious again in what order they are handled. And if the order in process_response can't be used in the same order as in process_request, maybe the solution really would be to just split that middleware.

I think I know what I dislike about the current solution: many of the django-delivered middleware is order-sensitive in the process_response phase, while the process_request phase just is some setup step (or completely missing). But we still order them with regard to the handling of process_request ...

comment:3 Changed 9 years ago by adrian

I'm all for changing the middleware algorithm so that it doesn't reverse it in process_response. That was Jacob's original idea when he set up the system -- maybe he can shed some light on it?

comment:4 Changed 9 years ago by sune.kirkeby@…

Uhmm... The cache-middleware depends on this feature, so please don't change it without a very good reason.

comment:5 Changed 9 years ago by anonymous

sune: actually the CacheMiddleware doesn't rely on the specific reversed order - it just relies on _one_ defined order, so users know where to put it in the stack of middleware :-)

comment:6 Changed 7 years ago by SmileyChris

  • Triage Stage changed from Unreviewed to Design decision needed

comment:7 Changed 7 years ago by ubernostrum

#749 was a duplicate, but has some discussion that's worth looking at.

comment:8 Changed 7 years ago by jacob

  • Triage Stage changed from Design decision needed to Someday/Maybe

At this point, changing the default response middleware ordering could break a huge amount of code. I wouldn't be opposed to something like a RESPONSE_MIDDLEWARE_ORDERING setting (or whatever), but so far at least I haven't seen a huge need for it.

comment:9 Changed 7 years ago by trevor

Alternatively, ond could list explicit dependencies directly in the middleware classes themselves, and have the ordering managed automatically.

Changed 6 years ago by Rob Hudson <treborhudson@…>

Adding example dependencies and testing dependencies are met during management's validate.

comment:10 Changed 6 years ago by Rob Hudson <treborhudson@…>

  • Needs documentation set
  • Patch needs improvement set

While it doesn't solve all the concerns brought up with this ticket, the mw-depends.diff patch shows adding and testing of dependencies of middleware. I originally set out to build a middleware graph and test for cycles (bad) and dependencies but stumbled upon this solution which was much simpler. So for tonight, at least, that's where I'm stopping. Note: Not all the built-in middleware classes have the depends attribute added -- I consider this not done.

As sample output, I've reversed the session and auth middleware to trigger an error on manage.py validate and here is what it looks like:

django.contrib.auth.middleware.AuthenticationMiddleware: Dependency on "django.contrib.sessions.middleware.SessionMiddleware" 
not met.  Check to make sure the dependent middleware is included in MIDDLEWARE_CLASSES and is listed above AuthenticationMiddleware.

comment:11 follow-up: Changed 6 years ago by jacob

Looks pretty nice to me. Two improvements I'd make:

  • I'm not sure that continue'ing for invalid entries is correct; might be better to complain about the incorrect entry. Not sure here, though - is there a reason you choose to do it this way?
  • I'd take away the empty "depends" declarations in a few of the classes - looks cleaner to simply not provide it if it's not needed.

comment:12 in reply to: ↑ 11 Changed 6 years ago by Rob Hudson <treborhudson@…>

Replying to jacob:

I'm not sure that continue'ing for invalid entries is correct; might be better to complain about the incorrect entry. Not sure here, though - is there a reason you choose to do it this way?

Simply because I wasn't ready to think about the right thing to do. Most of the middleware loading code is a DRY violation since it's directly copied from django.core.handlers.base.BaseHandler's load_middleware method, except for this method sets up which type of middleware it is (e.g. request, response, etc.) Would it make sense to abstract this out a little so it's not repeated -- make a load_middleware_instance(mw_path) method that returns an instance of the middleware given the path? Then this method can raise whatever exceptions it finds and the validation can put it in a try/except block. That's a bit of refactoring I wasn't ready to commit to last night and would like some encouragement before I go down that path. If I were to pull it out so both bits of code can re-use it, where would be a proper place for it to be?

I'd take away the empty "depends" declarations in a few of the classes - looks cleaner to simply not provide it if it's not needed.

I can remove it, but I like it because then the developer will know that this middleware has been "updated" to specify what the middleware depends on and it has no dependencies.

Thanks for the feedback.

comment:13 Changed 6 years ago by jacob

  • Triage Stage changed from Someday/Maybe to Accepted

comment:14 follow-up: Changed 6 years ago by Alberto García Hierro <fiam@…>

I think the approach taken by Gentoo's init script would fit well here. Basically, there are three keywords inits scripts can use: depends, uses and after (I think all of them are self explanatory). This allows, for example, to mark a middleware class which need to be the last one as " after = '*' ". Otherwise, if you made it dependant on every Django middleware class, you're forcing everyone using custom middleware to patch Django (or allowing them to misconfigure the middleware stack).

comment:15 Changed 6 years ago by Alberto García Hierro <fiam@…>

Well, I forgot about "before", which is a supported keyword and would also make sense here.

comment:16 in reply to: ↑ 14 Changed 6 years ago by Rob Hudson <treborhudson@…>

Replying to Alberto García Hierro <fiam@rm-fr.net>:

I think the approach taken by Gentoo's init script would fit well here. Basically, there are three keywords inits scripts can use: depends, uses and after (I think all of them are self explanatory). This allows, for example, to mark a middleware class which need to be the last one as " after = '*' ". Otherwise, if you made it dependant on every Django middleware class, you're forcing everyone using custom middleware to patch Django (or allowing them to misconfigure the middleware stack).

I agree. I started with the depends stuff and when I started thinking about the caching middleware it started breaking down, especially with middleware that alters the Vary header.

comment:17 Changed 6 years ago by andrewbadr

  • Owner changed from nobody to andrewbadr

I'm working on a patch to allow distinct request and response orderings.

Changed 6 years ago by andrewbadr

comment:18 Changed 6 years ago by andrewbadr

  • Has patch set
  • Needs tests set

This ticket shows the approach I decided to take, which is to have an optional "OUTGOING_MIDDLEWARE_ORDER" setting or default to current behavior (reverse). The patch seems to work correctly, but doesn't have tests or documentation yet.

I started implementing something like Gentoo's init scripts described above, but decided it would be too complicated to maintain. At the minimum, every middleware would need two new fields -- comes before and comes after -- and you could only do those if you knew the path to the other middlewares you were talking about. I think it would be fine for the built-in ones, but when you consider people using multiple third-party middlewares that don't know about each other, you start need extra middleware dependency information in your settings. Furthermore, it makes it hard to understand what your code is actually doing -- not everyone wants to do a topological sort.

That's why I chose this more straightforward approach, even though it is somewhat error-prone to manually reverse MIDDLEWARE_CLASSES.

#5691 is blocked on this.

comment:19 Changed 6 years ago by __hawkeye__

In discussion with Malcolm, we discussed some concern about the fact that OUTGOING_MIDDLEWARE_ORDER setting will always be necessary.

Users don't have the option to ignore OUTGOING_MIDDLEWARE_ORDER if they aren't impacted by this problem, since it will have to exist in global_settings and will have to be overridden.

This also means that we are guaranteed to break any existing code where people have explicitly specified MIDDLEWARE_CLASSES... and it will happen in a non-obvious way (it won't explode, it'll just go wonky).

We're still discussing this here and it looks like every solution comes with its own particularly hairy problems.

comment:20 Changed 6 years ago by andrewbadr

Could we use a guard value in global_settings? That is, ignore OUTGOING_MIDDLEWARE_ORDER if it's None or [] or whatever the default is?

comment:21 Changed 6 years ago by __hawkeye__

Here is a "different but similar" approach to that taken by andrewbadr.

I've allowed MIDDLEWARE_CLASSES to be either a tuple/list or dict.

If it is a tuple/list, then it is treated as it's presently -- down for request/back up for response (and exceptions).

If it is a dict, then it looks for keys labeled request, response, exception, and view and uses them in their natural order.

Downsides

  1. May cause DRY violation, though many middleware only care about 1 of the 4 types.
  2. Certainly doesn't lead to an 'attractive' settings file.
  3. Exposes the complexity of the problems discussed here to the developer. (Requires the developer to solve ordering problems with middleware)

Upsides

  1. Solves the underlying problem of different orderings being needed for different pieces of the request lifecycle
  2. Maintains explicit ordering.
  3. Users still has complete control over ordering (as opposed to depends/before/etc)
  4. Backwards compatible (people using the existing MIDDLEWARE_CLASSES specification will still be able to)
  5. Only one setting -- rather than coupling MIDDLEWARE_CLASSES and OUTGOING_MIDDLEWARE_ORDER
  6. Addresses all middleware paths (rather than just response)
  7. Default is still pretty painless
  8. Should be intuitive as long as the developer looks at the example

Regardless, we still need tests and docs.

Changed 6 years ago by __hawkeye__

comment:22 Changed 6 years ago by andrewbadr

A couple issues:

1) (This applies to both of our patches.) Each middleware is instantiated multiple times, so it's impossible to share data from the inbound to the outbound processing as some middlewares currently do.

2) If I want to switch from tuple/list to a dict, I have to look at every middleware I'm using (Session, Authentication...) and figure out which methods it implements so that I know which dict keys to add it to.

Changed 6 years ago by andrewbadr

comment:23 Changed 6 years ago by andrewbadr

Just uploaded a revised version of the approach used by my first patch. Fixes issues (1) above and another bug when no outgoing order is specified.

Something I didn't mention is that this patch requires OUTBOUND_MIDDLEWARE_ORDER to have the exact same contents as MIDDLEWARE_CLASSES.

comment:24 Changed 6 years ago by mrts

Another way to solve this would be to use (middleware, outbound_priority) tuples in MIDDLEWARE_CLASSES.

Inbound ordering is specified by the order of elements in MIDDLEWARE_CLASSES as previously. Outbound ordering would be explicitly specified by the outbound_priority value.

E.g. the following tuple

MIDDLEWARE_CLASSES = (
    ('django.middleware.common.CommonMiddleware', 2),
    ('django.contrib.sessions.middleware.SessionMiddleware', 3),
    ('django.contrib.auth.middleware.AuthenticationMiddleware', 1),
    ('django.middleware.doc.XViewMiddleware', 0),
)

would be ordered as follows:

>>> def outbound_order():
...     outbound = [ (mw[1], mw[0]) for mw in MIDDLEWARE_CLASSES ]
...     outbound.sort()
...     return [ mw[1] for mw in outbound ]
... 
>>> outbound_order()
['django.middleware.doc.XViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware']

comment:25 Changed 6 years ago by mrts

Although this looks uglier than OUTBOUND_MIDDLEWARE_ORDER, it overcomes the problems __hawkeye__ outlined in #730:comment:19 and avoids having to duplicate all middleware in OUTBOUND_MIDDLEWARE_ORDER if the latter is required. Duplication violates DRY and is fragile -- if you add or remove something from MIDDLEWARE_CLASSES, you have to remember to update OUTBOUND_MIDDLEWARE_ORDER accordingly.

The multiple instantiation problem mentioned above can easily be avoided by creating a list from the tuple that contains (middleware_instance, outbound_priority) tuples first and use these instances to fill inbound and outbound middleware lists.

Perhaps a more eye-pleasing (but more obscure?) way to specify ordering is to use "weight" instead of "priority", i.e. "heavier" items will be processed before lighter ones:

MIDDLEWARE_CLASSES = (
    ('django.middleware.common.CommonMiddleware', 0),
    ('django.contrib.sessions.middleware.SessionMiddleware', 1),
    ('django.contrib.auth.middleware.AuthenticationMiddleware', 3),
    ('django.middleware.doc.XViewMiddleware', 2),
)

and correspondind outbound processing order:

[('django.contrib.auth.middleware.AuthenticationMiddleware', 3),
('django.middleware.doc.XViewMiddleware', 2),
('django.contrib.sessions.middleware.SessionMiddleware', 1),
('django.middleware.common.CommonMiddleware', 0)]

Note that the outbound_order function above is just an illustration. An actual implementation should check for outbound_priority uniqueness.

Changed 6 years ago by mrts

Note that the patch can be made backwards-compatible by adding additional isinstance(middleware, tuple) checks. Whether that is good remains open.

Changed 6 years ago by mrts

Added backwards-compatibility and uniqueness checks. All tests pass as of r8219. See notes below.

comment:26 Changed 6 years ago by mrts

Notes on the last patch:

  • for backwards-compatibility, MIDDLEWARE_CLASSES is processed exactly like before if they are not tuples
  • if they are tuples, they are interpreted as (middleware_instance, weight) as specified above (i.e. heavier items processed first when outbound)
  • weight values have to be unique to avoid typos-confusion-whatnot
  • only one instance is created of each middleware class
  • tests pass

comment:27 follow-up: Changed 6 years ago by anonymous

mrts, I like that your approach is more DRY, but it does make it harder to understand what is going on by reading the settings (you have to sort middleware in your head). Not sure which I prefer. Anyways, two comments: (1) I was opposed to enforcing weight-uniqueness at first, because that makes it hard to insert in the middle, but then I realized that floats would work fine. Might be worth mentioning. (2) You can't use "set" because it's from 2.4.

comment:28 in reply to: ↑ 27 Changed 6 years ago by mrts

Replying to anonymous:

mrts, I like that your approach is more DRY, but it does make it harder to understand what is going on by reading the settings (you have to sort middleware in your head).

Indeed, first-glance readability suffers somewhat. I'd say the "heavier-first" at least looks a little bit better (though is actually counterintuitive).

A minor point to note: http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists/ . In this case, MIDDLEWARE_CLASSES is a homogeneous constant list, but the entries are hetereogeneous and should be thought of as C structs (path='foo', outbound_weight=x). This mental juggling and the common "another constant list" first impression for the latter might be what creates the mental barrier to readability.

Not sure which I prefer. Anyways, two comments: (1) I was opposed to enforcing weight-uniqueness at first, because that makes it hard to insert in the middle, but then I realized that floats would work fine. Might be worth mentioning.

Good point.

(2) You can't use "set" because it's from 2.4.

That is already taken care of with

try: 
    set 
except NameError: 
    from sets import Set as set

in the beginning of the patch.

Last not least: as far as I can understand, explicit ordering is not required for most users, so they can happily continue writing their settings as they did before. Advanced users should be more prepared for dealing with the sort-middleware-in-your-head problem.

comment:29 Changed 6 years ago by anonymous

  • Triage Stage changed from Accepted to Design decision needed

comment:30 Changed 6 years ago by mrts

s/explicit ordering is not required/explicit *outbound* ordering is not required/ in my previous comment.

comment:31 Changed 6 years ago by mrts

Jacob has picked the dict approach. What needs fixing:

  • single instances
  • validation: middleware foo handles both requests and responses, but is only registered for one of them due to user error (this is the major downside of using dicts)
  • tests
  • documentation

I'm still somewhat concerned about the RY and fragility of the dicts (the validation point above), but I like the control and explicitness they provide as Jacob does. I'll deal with single instance and validation as soon as I can.

comment:32 Changed 6 years ago by jacob

  • Triage Stage changed from Design decision needed to Accepted

Changed 6 years ago by mrts

Improved dict middleware setting. Single instances, basic hasattr('process_foo', mw) checks.

comment:33 Changed 6 years ago by mrts

An aside note,adding dependency handling would be easy as well:

class FooMiddleware(object):
  __init__(self, dependencies):
    if 'SessionMiddleware' not in dependencies:
     raise ImproperlyConfigured("FooMiddleware depends on SessionMiddleware")

and corresponding mw_class(mw_classnames) on line where middleware instance is created from the class, where mw_classnames contains mw_instance.__class__es for all middleware previously instantiantiated. Let me know if this is needed.

Changed 6 years ago by mrts

Added validation as well. Consider this patch to be a draft of the general ideas, it's not even tested.

comment:34 Changed 6 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

After a long conversation with Malcolm, we realized that this has kinda spun out of control. Recall that the original problem is realy just about the cache middleware; hence [8260].

So, I'm going to call this fixed, and shy away from any of these bigger problems. Why? I really can't come up with any example of why you'd need to customize middleware ordering in this manner. On top of that, all of the proposed syntaxes are confusing and error-prone. Finally, if there is another case like the cache middleware, it's really quite easy to split the behavior into two middleware classes.

comment:35 Changed 6 years ago by andrewbadr

Jacob, I have a final proposal here. There could be functions inbound_only and outbound_only that, given a middleware path, return a middleware class with only the appropriate half of the functions. To solve something like #5691, a user could do the following:

from django.middleware import inbound_only, outbound_only

MIDDLEWARE_CLASSES = ( 
    outbound_only('django.middleware.cache.CacheMiddleware'),
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.locale.LocaleMiddleware',
    inbound_only('django.middleware.cache.CacheMiddleware'),
)

This is simpler to write and maintain than the previous choices, while also not requiring you to split up any middleware. It may also be useful in cases beyond ordering, like if you only *want* half of a middleware for whatever reason.

comment:36 Changed 6 years ago by andrewbadr

Oops, sorry, I hadn't looked at [8260] which splits up the CacheMiddleware in trunk. I'm happy with this decision.

comment:37 Changed 6 years ago by mrts

It's not the cleanest solution (require to split up a functional unity only because there is no proper control of processing order), but practicality beats purity. All hail the decision :)

comment:38 Changed 6 years ago by mrts

Perhaps we should continue discussing this after 1.0 (where "discussing" means trying to come up with a nice and clean design)?

comment:39 Changed 6 years ago by anonymous

  • Cc oliver@… added

comment:40 Changed 3 years ago by jacob

  • milestone 1.0 beta deleted

Milestone 1.0 beta deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.