Opened 14 years ago
Closed 10 years ago
#15089 closed New feature (fixed)
contrib.sites and multitenancy
Reported by: | legutierr | Owned by: | Florian Apolloner |
---|---|---|---|
Component: | contrib.sites | Version: | |
Severity: | Normal | Keywords: | |
Cc: | django@…, James Lecker Jr, Jari Pennanen, linovia, NicoEchaniz, dcwatson@…, hvdklauw@…, tgecho, fernando.sanchez@…, florian+django@…, krzysiumed@…, tom@…, net147, charette.s@…, w2lkm2n@…, simonkagwe, riccardo.magliocchetti@…, Giacomo Graziosi | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
With reference to this thread:
Recent changes to django trunk made in [13980] create a new site-retrieval function, get_current_site()
, which supplants Site.objects.get_current()
. However, in spite of the fact that get_current_site()
takes a request object as an argument, that request object is not being used to retrieve the current site; only the SITE_ID setting is used. Furthermore, there is a need to allow users to register their own site-retrieval functions, should they desire different functionality .
In this change, however, backwards compatibility must be maintained:
- if the SITE_ID setting is implemented, then that SITE_ID must be used to retrieve the site record in all cases where contrib.sites is installed.
- when contrib.sites is not installed, a RequestSite object should be returned
Specifically, the modification should contain the following elements:
- add a deprication warning to
Site.objects.get_current_site()
- add a method
get_site_for_request()
to Site.objects; this method will inspect the request, usingrequest.get_host()
, and will then return the site object matching the request. - add a method
get_site_for_id()
to Site.objects, which method can be used byget_current_site()
in the case of SITE_ID being populated - modify get_current_site() to handle four cases:
- If SITE_CALLBACK (a new setting) is defined, return the result of passing the request into that callable
- Else if contrib.sites is not installed, return a RequestSite
- Else if SITE_ID is defined, use
Site.objects.get_site_for_id()
to get the site conforming to SITE_ID - Otherwise, use
Site.objects.get_site_for_request()
to retrieve the site that conforms to the request.
- add new tests to sites.tests
- modify the contrib.sites documentation to reflect the new approach and the new setting, SITE_CALLBACK
In addition, I would like to suggest two additional changes that are not documented in the above-referenced django-developers group thread:
- Move the
get_current_site()
method out of the models.py file, and into__init__.py
. The thinking behind this is that referencing settings inside a models.py file has a bit of a code smell, and thatget_current_site()
, now being the key control point of the library, should exist at a dependency level above the other modules. - Add an additional keyword argument to get_current_site(), specifically
default_to_request_site
, with default valueFalse
. In the event thatdefault_to_request_site
is false, an error would be raised if Sites is not installed. The thinking behind this change is that third-party apps, the primary potential users of multitenancy implemented by contrib.sites, will most likely link their models to the Site object via a ForeignKey. As a result, RequestSite objects will produce errors if they are used, and additional logic will have to be implemented to special-case the handling of RequestSite objects in such a situation.
These changes, if pursued, would also require modifications in the other contrib apps that use RequestSite.
Attachments (5)
Change History (63)
comment:1 by , 14 years ago
Component: | Uncategorized → Contrib apps |
---|
comment:2 by , 14 years ago
Version: | 1.2 → 1.3-beta |
---|
comment:3 by , 14 years ago
Keywords: | blocker added |
---|---|
milestone: | → 1.3 |
Triage Stage: | Unreviewed → Design decision needed |
comment:4 by , 14 years ago
Cc: | added |
---|
comment:5 by , 14 years ago
The premise of this ticket seems good to me, though there are definitely some sticking points in the design.
Expanding the way get_current_site()
does its lookups seems like the most worthwhile portion of this proposal, so I can definitely get behind that.
Because Site.objects.get_current_site()
is a model manager method I don't see deprecating it as a necessity, but I also don't mind it for the sake of avoiding confusion.
Adding a Site.objects.get_site_for_id()
method seems wholly unnecessary just to avoid writing Site.objects.get(id=FOO)
unless it has some other handling built in which you haven't explained.
Under the current system, if contrib.sites is installed but SITE_ID is missing an error should be raised, it shouldn't fall back to doing a Site object lookup based on the request automagically. Making SITE_ID an optional (rather than required) setting for the Sites Framework isn't a wholly trivial change and could easily confuse users.
I'm -1 on adding a default_to_request_site
parameter to get_current_site()
, though. The entire point of get_current_site()
is that it should always return a Site
or RequestSite
(unless you've misconfigured the Sites Framework). If third-party app authors require that they always get back a Site object, then they should be doing lookups using Site.objects
directly. This may, in fact, be an argument in favor of *not* deprecating Site.objects.get_current_site()
if this use case is of real concern.
Those are my current thoughts. All in all I'm +0 on this proposal.
I'll leave separate feedback (not directly related to the proposal on this ticket) in the mailing list thread.
comment:6 by , 14 years ago
Thanks for your comments, Gabriel.
You write:
Adding a Site.objects.get_site_for_id() method seems wholly unnecessary just to avoid writing Site.objects.get(id=FOO) unless it has some other handling built in which you haven't explained.
The cacheing logic would be moved from Site.objects.get_current() to Site.objects.get_site_for_id(). This way, the new implementation would be computationally equivalent to the old implementation. The interfaces would be re-factored, but the case where SITE_ID is used to retrieve the site would still have the same underlying logic.
You write:
Under the current system, if contrib.sites is installed but SITE_ID is missing an error should be raised, it shouldn't fall back to doing a Site object lookup based on the request automagically. Making SITE_ID an optional (rather than required) setting for the Sites Framework isn't a wholly trivial change and could easily confuse users
There are just two ways to approach the precedence of SITE_ID vs. the request in determining what site to retrieve. SITE_ID can act as a default, or SITE_ID can act as the override. What you are suggesting, making SITE_ID required, would force it to act as a default. Although this is a viable option, of course, it seems less intuitive to me than causing it to act as an override. Either way, the decision should be about whether it is more intuitive for SITE_ID to be a default/fallback, or for it to override whatever might be in the request. Also, I do not see what would be automagical about using the request to lookup the site, that's the whole point of this ticket.
You write:
I'm -1 on adding a default_to_request_site parameter to get_current_site(), though. The entire point of get_current_site() is that it should always return a Site or RequestSite (unless you've misconfigured the Sites Framework). If third-party app authors require that they always get back a Site object, then they should be doing lookups using Site.objects directly.
Given that among the proposed requirements is the idea that users should be able to provide their own overriding SITE_CALLBACK method, which could return anything at all, it seems futile to narrowly define get_current_site() in terms of what types of objects it returns. Nevertheless, it is important, I think, to enable 3rd party app designers to be able to control whether or not the object they are receiving is an ORM object.
In most cases, the third-party app will want to link to the ORM object either via a ForeignKey, or more likely (given that the specific class cannot be ensured) through a generic relation. In few cases will a RequestSite or some other mock object be acceptable. Direct calls to SiteManager methods will also be problematic, for a number of reasons: if SITE_CALLBACK is defined, the return value may not be a Site object; direct calls to SiteManager methods will circumvent the logic managing the use of SITE_ID vs. the request in retrieving the Site object; a central method is required in order to encourage consistency of approach.
Perhaps the correct name of this parameter is require_orm_object
or something like that, instead, given that it should also inform what is returned by SITE_CALLBACK.
comment:7 by , 14 years ago
In case anyone is interested, I've started to put my sites app hacks in a 3rd party app (https://github.com/jezdez/django-sites-tools/). Currently, it only provides a custom middleware (and a view decorator) to add a request.site object and a model manager that can filter objects depending on the current site. I think adding a SITES_DETECTION_BACKEND setting that is used by the middleware would be easy to implement.
comment:8 by , 14 years ago
Cc: | added |
---|
comment:9 by , 14 years ago
The above patch is not complete, by any means. It lacks complete testing, and contains no modifications at all to the documentation. However, I do believe it to reflect the views of the people who have voiced their opinions here and in the original thread.
comment:10 by , 14 years ago
Has patch: | set |
---|
comment:11 by , 14 years ago
Patch needs improvement: | set |
---|
comment:12 by , 14 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
comment:14 by , 14 years ago
Hi, It would be great if user could be isolated between sites, I created a Ticket with some patches that made this, and unittest for it here: http://code.djangoproject.com/ticket/15110 , this woudl help to improve the multitenancy feature in django. I hope it helps.
Regards.
by , 14 years ago
Attachment: | 15089_patch_b.diff added |
---|
Patch that includes all specified functionality, tests, and documentation changes. Tests for sites and sites_framework pass, but other tests have not been run.
comment:15 by , 14 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Patch needs improvement: | unset |
Although additional feedback is obviously needed, I think that this patch provides everything required to move this ticket forward, including test coverage for all the new features, as well as changes to the documentation reflecting the new approach.
comment:16 by , 14 years ago
Cc: | added |
---|
comment:17 by , 14 years ago
Keywords: | blocker removed |
---|---|
milestone: | 1.3 |
Patch needs improvement: | set |
I'm not sure why this is marked as 1.3 blocker. Much as I'd like to see it added, dynamic multitenancy is a new feature for contrib.sites that belongs in 1.4 at this point. I haven't seen anywhere in this thread a bug identified in a 1.3 feature. It's not a bug that the new get_current_site function doesn't support dynamic multitenancy, as contrib.sites has never supported that, and get_current_site wasn't intended to add support for it (it was intended to add a RequestSite fallback when contrib.sites is not installed, and it already does that just fine). So I'm removing the 1.3 milestone and blocker flag - Russell, please feel free to reverse if there's something obvious I'm missing.
I've posted some more design-decisiony thoughts on the feature and current patch here: http://groups.google.com/group/django-developers/msg/c7e3ee09837851c3
comment:18 by , 14 years ago
Cc: | added |
---|
follow-up: 21 comment:19 by , 14 years ago
It may sound a bit naive but why would we need the request object to get the site ?
I mean, urlconf management can be linked to a request, but the code stores it within the local thread so that any reverse can be done without explicitly giving the current request.
Wouldn't it make sense to store the site id the same way the urlconfs are handled and defaults it to a DEFAULT_SITE defined in the settings file ?
comment:20 by , 14 years ago
Cc: | added |
---|
comment:21 by , 14 years ago
Replying to linovia:
It may sound a bit naive but why would we need the request object to get the site ?
I mean, urlconf management can be linked to a request, but the code stores it within the local thread so that any reverse can be done without explicitly giving the current request.
Wouldn't it make sense to store the site id the same way the urlconfs are handled and defaults it to a DEFAULT_SITE defined in the settings file ?
Yes, putting the request (or the request host, or a site id picked dynamically in a middleware) into threadlocals would be one approach (in fact, its the method currently chosen by third-party apps implementing dynamic multitenancy for Django), but Django's philosophy has generally been to minimize usage of threadlocals: if you need information from the request, you should pass it in to where it's needed. Thus the effort here to find a threadlocals-free approach if possible.
comment:22 by , 14 years ago
Cc: | added |
---|
comment:23 by , 14 years ago
Component: | Contrib apps → contrib.sites |
---|
comment:24 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → New feature |
comment:25 by , 13 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
comment:26 by , 13 years ago
Cc: | removed |
---|---|
UI/UX: | unset |
comment:27 by , 13 years ago
Cc: | added |
---|
comment:28 by , 13 years ago
Cc: | added |
---|
comment:29 by , 13 years ago
Cc: | added |
---|
comment:30 by , 13 years ago
Cc: | added |
---|---|
Version: | 1.3-beta |
I would keep get_current_site as is (eg don't perform any magic regarding SITE_ID) and just add a setting to allow users to add their own backend to resolve the current site. Looking up a Site object based on the request is a twoliner then.
Is there anything else which needs to be done to get this ticket rolling again?
comment:31 by , 12 years ago
Owner: | changed from | to
---|
comment:32 by , 12 years ago
Updated and minimalistic patch on https://github.com/apollo13/django/compare/ticket15089
comment:33 by , 12 years ago
Cc: | added |
---|
comment:34 by , 12 years ago
Cc: | added |
---|
comment:35 by , 12 years ago
Cc: | added |
---|
comment:36 by , 12 years ago
I like the latest patch as it's lightweight and somehow mitigates the issue, it'll reduce to the minimum the number of hacks required to achieve multitenancy.
It's worth noting that there are a few places in django that use SiteManager.get_current() instead of get_current_site(). A quick scan gave me sitemaps, comment feeds and comment moderation. These are usually from functions that don't have access to the request. I'd say this patch should include some refactoring to completely remove get_current() usage. It should probably be documented that any use of get_current() in third-party apps will break multitenancy, instead of recommending get_current() as it's currently the case, it should be strongly advised against.
Proper multitenancy is still not achieved, even with this patch, because it doesn't address things like MEDIA_ROOT, MEDIA_URL, EMAIL_SUBJECT_PREFIX, DEFAULT_FROM_EMAIL that are needed for most multitenant usages.
For completeness I'll also mention TIME_ZONE, LANGUAGE_CODE and ROOT_URLCONF that are important elements of multitenancy as well, even though it's already possible to affect their value on a per request basis without resorting to terrible hacks.
To solve this issue some people resort to wrapping settings and storing their values in threaded locals, at one point or another it breaks as some code don't like wrapped values. Since django is fundamentally broken when it comes to multitenancy I suggest to people that really need it and that can afford single-thread installations to simply modify settings on the fly. It's frowned upon but other solutions are just as bad.
comment:37 by , 12 years ago
Cc: | added |
---|
by , 12 years ago
Attachment: | 15089-missing-site_id.diff added |
---|
get site from request when no SITE_ID
comment:39 by , 12 years ago
I've just uploaded an intermediary patch which make get_current_site always use the request to find out the current site if SITE_ID is not in the settings. Of course, having a way to provide a custom method to retrieve the current site is still useful, but maybe in a second stage?
comment:41 by , 12 years ago
Any chance this ends up in 1.5, or is it even on anyone's radar? Just wondering since we're about to convert our site to multi-tenant architecture and have been holding off a while, hoping django would become more friendly to it.
comment:42 by , 12 years ago
Review carefully the latest patch, test it, tell us if it effectively solve issues for you. This will help us to go forward.
comment:43 by , 12 years ago
Will do. We'll be starting to implement our change-over to multitenant probably in about 1 month's time, and I will see what the patch does and does not provide that we need and let you know.
comment:44 by , 12 years ago
Triage Stage: | Design decision needed → Accepted |
---|
Dynamic multi-tenancy is a commonly requested feature, no one is against the concept.
follow-up: 47 comment:45 by , 12 years ago
If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in BaseHandler .get_response()
, just like we do for URLCONF. Indeed, get_current_site()
and get_urlconf()
are the two sides of the same coin.
There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of Site.objects.get_current()
in the codebase (instead of get_current_site()
) hint at the problem.
Particularly troublesome if the SITE_ID is only available in the request:
- Declarative stuff like sitemaps, default values of model fields, etc.
- Random methods that don't get the request as parameter even though the calling method had access to it.
- Methods called from templates such as
get_absolute_url()
. Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.
get_current_site()
would need to be swappable just like KEY_FUNCTION
for the cache framework, so people can define their own switching logic.
Also it would be nice if the stock get_current_site()
had logic for getting the SITE_ID
from environ, so that management commands can target a specific site. Before someone mentions that you can make a temporary settings.py
that overrides theSITE_ID
, I'll say that while it works, it's not exactly practical when you have several hundred sites on a multi-tenant setup.
Finally, it's obvious that the perfect solution would also have to deal with settings, but the site switching problem can be solved without addressing it, so let's not condition this feature to a refactoring of settings that might never happen. Developers would still be on their own when it comes to switching settings per site, but this can be handled case by case by people who need it.
comment:46 by , 12 years ago
I agree that thread-local sites would make sense in many use cases. I think the whole Site model should be swappable. Then, there could be Site.get_current_site() method by which you can alter what is returned from that method. The default implementation returns site by settings.SITE_ID, but if you use a swappable model, then you can alter the implementation as you wish (middleware + thread-locals is a likely approach here...).
follow-up: 48 comment:47 by , 12 years ago
Replying to loic84:
If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in
BaseHandler .get_response()
, just like we do for URLCONF. Indeed,get_current_site()
andget_urlconf()
are the two sides of the same coin.
Absolutely not. First of making the BaseHandler
aware of the current site would imply strong coupling of a contrib app to core -- that's an absolute no-go. I also don't agree with the thread local, there is no need imo to make it one, User
for instance works fine without thread locals
There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of
Site.objects.get_current()
in the codebase (instead ofget_current_site()
) hint at the problem.
If you API requires a Site, pass in a Site object where needed.
Particularly troublesome if the SITE_ID is only available in the request:
- Declarative stuff like sitemaps, default values of model fields, etc.
I agree that this is a problem, but I don't think thread-locals are the solution, especially since in the shell you don't have access to the request either.
- Random methods that don't get the request as parameter even though the calling method had access to it.
That's a problem in the API.
- Methods called from templates such as
get_absolute_url()
. Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.
RequestContext?!
get_current_site()
would need to be swappable just likeKEY_FUNCTION
for the cache framework, so people can define their own switching logic.
That was part of my proposal.
comment:48 by , 11 years ago
Replying to apollo13:
Replying to loic84:
If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in
BaseHandler .get_response()
, just like we do for URLCONF. Indeed,get_current_site()
andget_urlconf()
are the two sides of the same coin.
Absolutely not. First of making the
BaseHandler
aware of the current site would imply strong coupling of a contrib app to core -- that's an absolute no-go. I also don't agree with the thread local, there is no need imo to make it one,User
for instance works fine without thread locals
Django completely ignores one of the main part of an URL (the domain), it's a blatant omission in my opinion. Having a hook in place in BaseHandler for people or apps that care about that portion of the URL is not the same as strong coupling a contrib app. contrib.sites
could simply consume that hook.
There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of
Site.objects.get_current()
in the codebase (instead ofget_current_site()
) hint at the problem.
If your API requires a Site, pass in a Site object where needed.
I'm not talking about my own API, I'm taking about areas of Django over which I have no control.
Particularly troublesome if the SITE_ID is only available in the request:
- Declarative stuff like sitemaps, default values of model fields, etc.
I agree that this is a problem, but I don't think thread-locals are the solution, especially since in the shell you don't have access to the request either.
Indeed you don't, and that's why I don't want to keep the curent SITE_ID in the request but in a thread local. In the shell you have control over 1. the settings, and 2. the env, so you could write SITE_ID=2 ./manage.py whatever
.
- Random methods that don't get the request as parameter even though the calling method had access to it.
That's a problem in the API, not in the sites stuff.
- Methods called from templates such as
get_absolute_url()
. Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.RequestContext?!
What do you mean? Right now you are out of luck if you need your Model.get_absolute_url()
to be site-aware, for example to return a fully qualified URL when the site for the current request is different than the site to which your model instance belongs. I don't see how RequestContext helps, even if the request is present in the template context, I won't be able to pass it as an argument to get_absolute_url
, so get_absolute_url
still has no idea about the current site.
comment:49 by , 11 years ago
Cc: | added |
---|
comment:50 by , 11 years ago
Cc: | added |
---|
comment:51 by , 11 years ago
Cc: | added |
---|
by , 11 years ago
Attachment: | 0001-By-using-the-request-to-do-the-Site-matching.patch added |
---|
Forward port the patch to git master
comment:52 by , 10 years ago
Cc: | added |
---|
comment:53 by , 10 years ago
Patch needs improvement: | unset |
---|
Removing the patch needs improvement flag to get the patch in the review queue
comment:54 by , 10 years ago
I would like to move this ticket forward in order to address the concerns in comment 5 of #21648 ('Deprecate "is_admin_site" option of auth.views.password_reset()').
I think Claude's patch is a reasonable start. PR 2460 was also submitted independently without knowledge of this ticket and is quite similar. Is anyone against this direction?
by , 10 years ago
Attachment: | 0001-Make-settings.SITE_ID-optional-for-django.contrib.si.patch added |
---|
Avoid the domain, port split already handled by request.get_host(), better commit message
comment:58 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
This is a blocker because it relates to a problem/limitation with a feature added in the 1.3 timeframe; marking DDN because the design discussion hasn't been fully resolved.