Opened 14 years ago

Closed 10 years ago

#15089 closed New feature (fixed)

contrib.sites and multitenancy

Reported by: legutierr Owned by: Florian Apolloner
Component: contrib.sites Version:
Severity: Normal Keywords:
Cc: django@…, James Lecker Jr, Jari Pennanen, linovia, NicoEchaniz, dcwatson@…, hvdklauw@…, tgecho, fernando.sanchez@…, florian+django@…, krzysiumed@…, tom@…, net147, charette.s@…, w2lkm2n@…, simonkagwe, riccardo.magliocchetti@…, Giacomo Graziosi Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

With reference to this thread:

Recent changes to django trunk made in [13980] create a new site-retrieval function, get_current_site(), which supplants Site.objects.get_current(). However, in spite of the fact that get_current_site() takes a request object as an argument, that request object is not being used to retrieve the current site; only the SITE_ID setting is used. Furthermore, there is a need to allow users to register their own site-retrieval functions, should they desire different functionality .

In this change, however, backwards compatibility must be maintained:

  • if the SITE_ID setting is implemented, then that SITE_ID must be used to retrieve the site record in all cases where contrib.sites is installed.
  • when contrib.sites is not installed, a RequestSite object should be returned

Specifically, the modification should contain the following elements:

  • add a deprication warning to Site.objects.get_current_site()
  • add a method get_site_for_request() to Site.objects; this method will inspect the request, using request.get_host(), and will then return the site object matching the request.
  • add a method get_site_for_id() to Site.objects, which method can be used by get_current_site() in the case of SITE_ID being populated
  • modify get_current_site() to handle four cases:
    1. If SITE_CALLBACK (a new setting) is defined, return the result of passing the request into that callable
    2. Else if contrib.sites is not installed, return a RequestSite
    3. Else if SITE_ID is defined, use Site.objects.get_site_for_id() to get the site conforming to SITE_ID
    4. Otherwise, use Site.objects.get_site_for_request() to retrieve the site that conforms to the request.
  • add new tests to sites.tests
  • modify the contrib.sites documentation to reflect the new approach and the new setting, SITE_CALLBACK

In addition, I would like to suggest two additional changes that are not documented in the above-referenced django-developers group thread:

  1. Move the get_current_site() method out of the models.py file, and into __init__.py. The thinking behind this is that referencing settings inside a models.py file has a bit of a code smell, and that get_current_site(), now being the key control point of the library, should exist at a dependency level above the other modules.
  2. Add an additional keyword argument to get_current_site(), specifically default_to_request_site, with default value False. In the event that default_to_request_site is false, an error would be raised if Sites is not installed. The thinking behind this change is that third-party apps, the primary potential users of multitenancy implemented by contrib.sites, will most likely link their models to the Site object via a ForeignKey. As a result, RequestSite objects will produce errors if they are used, and additional logic will have to be implemented to special-case the handling of RequestSite objects in such a situation.

These changes, if pursued, would also require modifications in the other contrib apps that use RequestSite.

Attachments (5)

15089_patch_a.diff (9.5 KB ) - added by legutierr 14 years ago.
A first version of a patch, for review.
15089_patch_b.diff (40.4 KB ) - added by legutierr 14 years ago.
Patch that includes all specified functionality, tests, and documentation changes. Tests for sites and sites_framework pass, but other tests have not been run.
15089-missing-site_id.diff (5.8 KB ) - added by Claude Paroz 12 years ago.
get site from request when no SITE_ID
0001-By-using-the-request-to-do-the-Site-matching.patch (6.3 KB ) - added by rm_ 11 years ago.
Forward port the patch to git master
0001-Make-settings.SITE_ID-optional-for-django.contrib.si.patch (6.4 KB ) - added by rm_ 10 years ago.
Avoid the domain, port split already handled by request.get_host(), better commit message

Download all attachments as: .zip

Change History (63)

comment:1 by legutierr, 14 years ago

Component: UncategorizedContrib apps

comment:2 by legutierr, 14 years ago

Version: 1.21.3-beta

comment:3 by Russell Keith-Magee, 14 years ago

Keywords: blocker added
milestone: 1.3
Triage Stage: UnreviewedDesign decision needed

This is a blocker because it relates to a problem/limitation with a feature added in the 1.3 timeframe; marking DDN because the design discussion hasn't been fully resolved.

comment:4 by anonymous, 14 years ago

Cc: django@… added

comment:5 by Gabriel Hurley, 14 years ago

The premise of this ticket seems good to me, though there are definitely some sticking points in the design.

Expanding the way get_current_site() does its lookups seems like the most worthwhile portion of this proposal, so I can definitely get behind that.

Because Site.objects.get_current_site() is a model manager method I don't see deprecating it as a necessity, but I also don't mind it for the sake of avoiding confusion.

Adding a Site.objects.get_site_for_id() method seems wholly unnecessary just to avoid writing Site.objects.get(id=FOO) unless it has some other handling built in which you haven't explained.

Under the current system, if contrib.sites is installed but SITE_ID is missing an error should be raised, it shouldn't fall back to doing a Site object lookup based on the request automagically. Making SITE_ID an optional (rather than required) setting for the Sites Framework isn't a wholly trivial change and could easily confuse users.

I'm -1 on adding a default_to_request_site parameter to get_current_site(), though. The entire point of get_current_site() is that it should always return a Site or RequestSite (unless you've misconfigured the Sites Framework). If third-party app authors require that they always get back a Site object, then they should be doing lookups using Site.objects directly. This may, in fact, be an argument in favor of *not* deprecating Site.objects.get_current_site() if this use case is of real concern.

Those are my current thoughts. All in all I'm +0 on this proposal.

I'll leave separate feedback (not directly related to the proposal on this ticket) in the mailing list thread.

comment:6 by legutierr, 14 years ago

Thanks for your comments, Gabriel.

You write:

Adding a Site.objects.get_site_for_id() method seems wholly unnecessary just to avoid writing Site.objects.get(id=FOO) unless it has some other handling built in which you haven't explained.

The cacheing logic would be moved from Site.objects.get_current() to Site.objects.get_site_for_id(). This way, the new implementation would be computationally equivalent to the old implementation. The interfaces would be re-factored, but the case where SITE_ID is used to retrieve the site would still have the same underlying logic.

You write:

Under the current system, if contrib.sites is installed but SITE_ID is missing an error should be raised, it shouldn't fall back to doing a Site object lookup based on the request automagically. Making SITE_ID an optional (rather than required) setting for the Sites Framework isn't a wholly trivial change and could easily confuse users

There are just two ways to approach the precedence of SITE_ID vs. the request in determining what site to retrieve. SITE_ID can act as a default, or SITE_ID can act as the override. What you are suggesting, making SITE_ID required, would force it to act as a default. Although this is a viable option, of course, it seems less intuitive to me than causing it to act as an override. Either way, the decision should be about whether it is more intuitive for SITE_ID to be a default/fallback, or for it to override whatever might be in the request. Also, I do not see what would be automagical about using the request to lookup the site, that's the whole point of this ticket.

You write:

I'm -1 on adding a default_to_request_site parameter to get_current_site(), though. The entire point of get_current_site() is that it should always return a Site or RequestSite (unless you've misconfigured the Sites Framework). If third-party app authors require that they always get back a Site object, then they should be doing lookups using Site.objects directly.

Given that among the proposed requirements is the idea that users should be able to provide their own overriding SITE_CALLBACK method, which could return anything at all, it seems futile to narrowly define get_current_site() in terms of what types of objects it returns. Nevertheless, it is important, I think, to enable 3rd party app designers to be able to control whether or not the object they are receiving is an ORM object.

In most cases, the third-party app will want to link to the ORM object either via a ForeignKey, or more likely (given that the specific class cannot be ensured) through a generic relation. In few cases will a RequestSite or some other mock object be acceptable. Direct calls to SiteManager methods will also be problematic, for a number of reasons: if SITE_CALLBACK is defined, the return value may not be a Site object; direct calls to SiteManager methods will circumvent the logic managing the use of SITE_ID vs. the request in retrieving the Site object; a central method is required in order to encourage consistency of approach.

Perhaps the correct name of this parameter is require_orm_object or something like that, instead, given that it should also inform what is returned by SITE_CALLBACK.

comment:7 by Jannis Leidel, 14 years ago

In case anyone is interested, I've started to put my sites app hacks in a 3rd party app (https://github.com/jezdez/django-sites-tools/). Currently, it only provides a custom middleware (and a view decorator) to add a request.site object and a model manager that can filter objects depending on the current site. I think adding a SITES_DETECTION_BACKEND setting that is used by the middleware would be easy to implement.

comment:8 by James Lecker Jr, 14 years ago

Cc: James Lecker Jr added

by legutierr, 14 years ago

Attachment: 15089_patch_a.diff added

A first version of a patch, for review.

comment:9 by legutierr, 14 years ago

The above patch is not complete, by any means. It lacks complete testing, and contains no modifications at all to the documentation. However, I do believe it to reflect the views of the people who have voiced their opinions here and in the original thread.

comment:10 by legutierr, 14 years ago

Has patch: set

comment:11 by legutierr, 14 years ago

Patch needs improvement: set

comment:12 by legutierr, 14 years ago

Needs documentation: set
Needs tests: set

comment:13 by Jannis Leidel, 14 years ago

Awesome!

comment:14 by jorgeecardona, 14 years ago

Hi, It would be great if user could be isolated between sites, I created a Ticket with some patches that made this, and unittest for it here: http://code.djangoproject.com/ticket/15110 , this woudl help to improve the multitenancy feature in django. I hope it helps.

Regards.

by legutierr, 14 years ago

Attachment: 15089_patch_b.diff added

Patch that includes all specified functionality, tests, and documentation changes. Tests for sites and sites_framework pass, but other tests have not been run.

comment:15 by legutierr, 14 years ago

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

Although additional feedback is obviously needed, I think that this patch provides everything required to move this ticket forward, including test coverage for all the new features, as well as changes to the documentation reflecting the new approach.

comment:16 by Harro, 14 years ago

Cc: Harro added

comment:17 by Carl Meyer, 14 years ago

Keywords: blocker removed
milestone: 1.3
Patch needs improvement: set

I'm not sure why this is marked as 1.3 blocker. Much as I'd like to see it added, dynamic multitenancy is a new feature for contrib.sites that belongs in 1.4 at this point. I haven't seen anywhere in this thread a bug identified in a 1.3 feature. It's not a bug that the new get_current_site function doesn't support dynamic multitenancy, as contrib.sites has never supported that, and get_current_site wasn't intended to add support for it (it was intended to add a RequestSite fallback when contrib.sites is not installed, and it already does that just fine). So I'm removing the 1.3 milestone and blocker flag - Russell, please feel free to reverse if there's something obvious I'm missing.

I've posted some more design-decisiony thoughts on the feature and current patch here: http://groups.google.com/group/django-developers/msg/c7e3ee09837851c3

comment:18 by Jari Pennanen, 14 years ago

Cc: Jari Pennanen added

comment:19 by linovia, 14 years ago

It may sound a bit naive but why would we need the request object to get the site ?
I mean, urlconf management can be linked to a request, but the code stores it within the local thread so that any reverse can be done without explicitly giving the current request.

Wouldn't it make sense to store the site id the same way the urlconfs are handled and defaults it to a DEFAULT_SITE defined in the settings file ?

comment:20 by linovia, 14 years ago

Cc: linovia added

in reply to:  19 comment:21 by Carl Meyer, 14 years ago

Replying to linovia:

It may sound a bit naive but why would we need the request object to get the site ?
I mean, urlconf management can be linked to a request, but the code stores it within the local thread so that any reverse can be done without explicitly giving the current request.

Wouldn't it make sense to store the site id the same way the urlconfs are handled and defaults it to a DEFAULT_SITE defined in the settings file ?

Yes, putting the request (or the request host, or a site id picked dynamically in a middleware) into threadlocals would be one approach (in fact, its the method currently chosen by third-party apps implementing dynamic multitenancy for Django), but Django's philosophy has generally been to minimize usage of threadlocals: if you need information from the request, you should pass it in to where it's needed. Thus the effort here to find a threadlocals-free approach if possible.

comment:22 by NicoEchaniz, 14 years ago

Cc: NicoEchaniz added

comment:23 by Gabriel Hurley, 14 years ago

Component: Contrib appscontrib.sites

comment:24 by James Addison, 14 years ago

Severity: Normal
Type: New feature

comment:25 by Dan Watson, 14 years ago

Cc: dcwatson@… added
Easy pickings: unset

comment:26 by Harro, 14 years ago

Cc: Harro removed
UI/UX: unset

comment:27 by Harro, 14 years ago

Cc: hvdklauw@… added

comment:28 by tgecho, 13 years ago

Cc: tgecho added

comment:29 by fernando.sanchez@…, 13 years ago

Cc: fernando.sanchez@… added

comment:30 by Florian Apolloner, 13 years ago

Cc: florian+django@… added
Version: 1.3-beta

I would keep get_current_site as is (eg don't perform any magic regarding SITE_ID) and just add a setting to allow users to add their own backend to resolve the current site. Looking up a Site object based on the request is a twoliner then.

Is there anything else which needs to be done to get this ticket rolling again?

comment:31 by Florian Apolloner, 13 years ago

Owner: changed from nobody to Florian Apolloner

comment:32 by Florian Apolloner, 13 years ago

comment:33 by Christopher Medrela, 13 years ago

Cc: krzysiumed@… added

comment:34 by Thomas Schreiber, 12 years ago

Cc: tom@… added

comment:35 by net147, 12 years ago

Cc: net147 added

comment:36 by anonymous, 12 years ago

I like the latest patch as it's lightweight and somehow mitigates the issue, it'll reduce to the minimum the number of hacks required to achieve multitenancy.

It's worth noting that there are a few places in django that use SiteManager.get_current() instead of get_current_site(). A quick scan gave me sitemaps, comment feeds and comment moderation. These are usually from functions that don't have access to the request. I'd say this patch should include some refactoring to completely remove get_current() usage. It should probably be documented that any use of get_current() in third-party apps will break multitenancy, instead of recommending get_current() as it's currently the case, it should be strongly advised against.

Proper multitenancy is still not achieved, even with this patch, because it doesn't address things like MEDIA_ROOT, MEDIA_URL, EMAIL_SUBJECT_PREFIX, DEFAULT_FROM_EMAIL that are needed for most multitenant usages.

For completeness I'll also mention TIME_ZONE, LANGUAGE_CODE and ROOT_URLCONF that are important elements of multitenancy as well, even though it's already possible to affect their value on a per request basis without resorting to terrible hacks.

To solve this issue some people resort to wrapping settings and storing their values in threaded locals, at one point or another it breaks as some code don't like wrapped values. Since django is fundamentally broken when it comes to multitenancy I suggest to people that really need it and that can afford single-thread installations to simply modify settings on the fly. It's frowned upon but other solutions are just as bad.

comment:37 by Simon Charette, 12 years ago

Cc: charette.s@… added

comment:38 by Claude Paroz <claude@…>, 12 years ago

In 6c2faaceb0482267cec19da0ff432984028f9d0c:

Made more extensive use of get_current_site

Refs #15089

by Claude Paroz, 12 years ago

Attachment: 15089-missing-site_id.diff added

get site from request when no SITE_ID

comment:39 by Claude Paroz, 12 years ago

I've just uploaded an intermediary patch which make get_current_site always use the request to find out the current site if SITE_ID is not in the settings. Of course, having a way to provide a custom method to retrieve the current site is still useful, but maybe in a second stage?

comment:40 by Claude Paroz <claude@…>, 12 years ago

In 1ce4aedcefb68086918adc4137d75a6f2c0bd1f2:

Prevented flatpage view from directly accessing settings.SITE_ID

Refs #15089

comment:41 by nebstrebor, 12 years ago

Any chance this ends up in 1.5, or is it even on anyone's radar? Just wondering since we're about to convert our site to multi-tenant architecture and have been holding off a while, hoping django would become more friendly to it.

comment:42 by Claude Paroz, 12 years ago

Review carefully the latest patch, test it, tell us if it effectively solve issues for you. This will help us to go forward.

comment:43 by nebstrebor, 12 years ago

Will do. We'll be starting to implement our change-over to multitenant probably in about 1 month's time, and I will see what the patch does and does not provide that we need and let you know.

comment:44 by Aymeric Augustin, 12 years ago

Triage Stage: Design decision neededAccepted

Dynamic multi-tenancy is a commonly requested feature, no one is against the concept.

comment:45 by loic84, 12 years ago

If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in BaseHandler .get_response(), just like we do for URLCONF. Indeed, get_current_site() and get_urlconf() are the two sides of the same coin.

There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of Site.objects.get_current() in the codebase (instead of get_current_site()) hint at the problem.

Particularly troublesome if the SITE_ID is only available in the request:

  • Declarative stuff like sitemaps, default values of model fields, etc.
  • Random methods that don't get the request as parameter even though the calling method had access to it.
  • Methods called from templates such as get_absolute_url(). Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.

get_current_site() would need to be swappable just like KEY_FUNCTION for the cache framework, so people can define their own switching logic.

Also it would be nice if the stock get_current_site() had logic for getting the SITE_ID from environ, so that management commands can target a specific site. Before someone mentions that you can make a temporary settings.py that overrides theSITE_ID, I'll say that while it works, it's not exactly practical when you have several hundred sites on a multi-tenant setup.

Finally, it's obvious that the perfect solution would also have to deal with settings, but the site switching problem can be solved without addressing it, so let's not condition this feature to a refactoring of settings that might never happen. Developers would still be on their own when it comes to switching settings per site, but this can be handled case by case by people who need it.

comment:46 by Anssi Kääriäinen, 12 years ago

I agree that thread-local sites would make sense in many use cases. I think the whole Site model should be swappable. Then, there could be Site.get_current_site() method by which you can alter what is returned from that method. The default implementation returns site by settings.SITE_ID, but if you use a swappable model, then you can alter the implementation as you wish (middleware + thread-locals is a likely approach here...).

in reply to:  45 ; comment:47 by Florian Apolloner, 12 years ago

Replying to loic84:

If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in BaseHandler .get_response(), just like we do for URLCONF. Indeed, get_current_site() and get_urlconf() are the two sides of the same coin.

Absolutely not. First of making the BaseHandler aware of the current site would imply strong coupling of a contrib app to core -- that's an absolute no-go. I also don't agree with the thread local, there is no need imo to make it one, User for instance works fine without thread locals

There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of Site.objects.get_current() in the codebase (instead of get_current_site()) hint at the problem.

If your API requires a Site, pass in a Site object where needed.

Particularly troublesome if the SITE_ID is only available in the request:

  • Declarative stuff like sitemaps, default values of model fields, etc.

I agree that this is a problem, but I don't think thread-locals are the solution, especially since in the shell you don't have access to the request either.

  • Random methods that don't get the request as parameter even though the calling method had access to it.

That's a problem in the API, not in the sites stuff.

  • Methods called from templates such as get_absolute_url(). Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.

RequestContext?!

get_current_site() would need to be swappable just like KEY_FUNCTION for the cache framework, so people can define their own switching logic.

That was part of my proposal.

Last edited 12 years ago by Florian Apolloner (previous) (diff)

in reply to:  47 comment:48 by loic84, 12 years ago

Replying to apollo13:

Replying to loic84:

If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in BaseHandler .get_response(), just like we do for URLCONF. Indeed, get_current_site() and get_urlconf() are the two sides of the same coin.

Absolutely not. First of making the BaseHandler aware of the current site would imply strong coupling of a contrib app to core -- that's an absolute no-go. I also don't agree with the thread local, there is no need imo to make it one, User for instance works fine without thread locals

Django completely ignores one of the main part of an URL (the domain), it's a blatant omission in my opinion. Having a hook in place in BaseHandler for people or apps that care about that portion of the URL is not the same as strong coupling a contrib app. contrib.sites could simply consume that hook.

There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of Site.objects.get_current() in the codebase (instead of get_current_site()) hint at the problem.

If your API requires a Site, pass in a Site object where needed.

I'm not talking about my own API, I'm taking about areas of Django over which I have no control.

Particularly troublesome if the SITE_ID is only available in the request:

  • Declarative stuff like sitemaps, default values of model fields, etc.

I agree that this is a problem, but I don't think thread-locals are the solution, especially since in the shell you don't have access to the request either.

Indeed you don't, and that's why I don't want to keep the curent SITE_ID in the request but in a thread local. In the shell you have control over 1. the settings, and 2. the env, so you could write SITE_ID=2 ./manage.py whatever.

  • Random methods that don't get the request as parameter even though the calling method had access to it.

That's a problem in the API, not in the sites stuff.

  • Methods called from templates such as get_absolute_url(). Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.

RequestContext?!

What do you mean? Right now you are out of luck if you need your Model.get_absolute_url() to be site-aware, for example to return a fully qualified URL when the site for the current request is different than the site to which your model instance belongs. I don't see how RequestContext helps, even if the request is present in the template context, I won't be able to pass it as an argument to get_absolute_url, so get_absolute_url still has no idea about the current site.

comment:49 by György Kiss, 11 years ago

Cc: w2lkm2n@… added

comment:50 by simonkagwe, 11 years ago

Cc: simonkagwe added

comment:51 by rm_, 11 years ago

Cc: riccardo.magliocchetti@… added

by rm_, 11 years ago

Forward port the patch to git master

comment:52 by Giacomo Graziosi, 11 years ago

Cc: Giacomo Graziosi added

comment:53 by rm_, 11 years ago

Patch needs improvement: unset

Removing the patch needs improvement flag to get the patch in the review queue

comment:54 by Tim Graham, 10 years ago

I would like to move this ticket forward in order to address the concerns in comment 5 of #21648 ('Deprecate "is_admin_site" option of auth.views.password_reset()').

I think Claude's patch is a reasonable start. PR 2460 was also submitted independently without knowledge of this ticket and is quite similar. Is anyone against this direction?

comment:55 by Florian Apolloner, 10 years ago

If that helps us internally, by all means apply it.

by rm_, 10 years ago

Avoid the domain, port split already handled by request.get_host(), better commit message

comment:56 by rm_, 10 years ago

↑ Forward ported the patch to apply cleanly against latest git master.

comment:57 by Tim Graham, 10 years ago

Made some updates to the patch: PR

comment:58 by Tim Graham <timograham@…>, 10 years ago

Resolution: fixed
Status: newclosed

In 32c7d3c061b83e9206ef2bf13fbc302a1998f317:

Fixed #15089 -- Allowed contrib.sites to lookup the current site based on request.get_host().

Thanks Claude Paroz, Riccardo Magliocchetti, and Damian Moore
for contributions to the patch.

Note: See TracTickets for help on using tickets.
Back to Top