Code

Opened 3 years ago

Last modified 2 weeks ago

#15089 new New feature

contrib.sites and multitenancy

Reported by: legutierr Owned by: apollo13
Component: contrib.sites Version:
Severity: Normal Keywords:
Cc: django@…, jlecker, Ciantic, linovia, NicoEchaniz, dcwatson@…, hvdklauw@…, tgecho, fernando.sanchez@…, florian+django@…, krzysiumed@…, tom@…, net147, charette.s@…, w2lkm2n@…, simonkagwe, riccardo.magliocchetti@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

With reference to this thread:

Recent changes to django trunk made in [13980] create a new site-retrieval function, get_current_site(), which supplants Site.objects.get_current(). However, in spite of the fact that get_current_site() takes a request object as an argument, that request object is not being used to retrieve the current site; only the SITE_ID setting is used. Furthermore, there is a need to allow users to register their own site-retrieval functions, should they desire different functionality .

In this change, however, backwards compatibility must be maintained:

  • if the SITE_ID setting is implemented, then that SITE_ID must be used to retrieve the site record in all cases where contrib.sites is installed.
  • when contrib.sites is not installed, a RequestSite object should be returned

Specifically, the modification should contain the following elements:

  • add a deprication warning to Site.objects.get_current_site()
  • add a method get_site_for_request() to Site.objects; this method will inspect the request, using request.get_host(), and will then return the site object matching the request.
  • add a method get_site_for_id() to Site.objects, which method can be used by get_current_site() in the case of SITE_ID being populated
  • modify get_current_site() to handle four cases:
    1. If SITE_CALLBACK (a new setting) is defined, return the result of passing the request into that callable
    2. Else if contrib.sites is not installed, return a RequestSite
    3. Else if SITE_ID is defined, use Site.objects.get_site_for_id() to get the site conforming to SITE_ID
    4. Otherwise, use Site.objects.get_site_for_request() to retrieve the site that conforms to the request.
  • add new tests to sites.tests
  • modify the contrib.sites documentation to reflect the new approach and the new setting, SITE_CALLBACK

In addition, I would like to suggest two additional changes that are not documented in the above-referenced django-developers group thread:

  1. Move the get_current_site() method out of the models.py file, and into __init__.py. The thinking behind this is that referencing settings inside a models.py file has a bit of a code smell, and that get_current_site(), now being the key control point of the library, should exist at a dependency level above the other modules.
  2. Add an additional keyword argument to get_current_site(), specifically default_to_request_site, with default value False. In the event that default_to_request_site is false, an error would be raised if Sites is not installed. The thinking behind this change is that third-party apps, the primary potential users of multitenancy implemented by contrib.sites, will most likely link their models to the Site object via a ForeignKey. As a result, RequestSite objects will produce errors if they are used, and additional logic will have to be implemented to special-case the handling of RequestSite objects in such a situation.

These changes, if pursued, would also require modifications in the other contrib apps that use RequestSite.

Attachments (5)

15089_patch_a.diff (9.5 KB) - added by legutierr 3 years ago.
A first version of a patch, for review.
15089_patch_b.diff (40.4 KB) - added by legutierr 3 years ago.
Patch that includes all specified functionality, tests, and documentation changes. Tests for sites and sites_framework pass, but other tests have not been run.
15089-missing-site_id.diff (5.8 KB) - added by claudep 19 months ago.
get site from request when no SITE_ID
0001-By-using-the-request-to-do-the-Site-matching.patch (6.3 KB) - added by rm_ 2 weeks ago.
Forward port the patch to git master
0001-Make-settings.SITE_ID-optional-for-django.contrib.si.patch (6.3 KB) - added by rm_ 2 weeks ago.
Avoid the domain, port split already handled by request.get_host(), better commit message

Download all attachments as: .zip

Change History (56)

comment:1 Changed 3 years ago by legutierr

  • Component changed from Uncategorized to Contrib apps
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 3 years ago by legutierr

  • Version changed from 1.2 to 1.3-beta

comment:3 Changed 3 years ago by russellm

  • Keywords blocker added
  • milestone set to 1.3
  • Triage Stage changed from Unreviewed to Design decision needed

This is a blocker because it relates to a problem/limitation with a feature added in the 1.3 timeframe; marking DDN because the design discussion hasn't been fully resolved.

comment:4 Changed 3 years ago by anonymous

  • Cc django@… added

comment:5 Changed 3 years ago by gabrielhurley

The premise of this ticket seems good to me, though there are definitely some sticking points in the design.

Expanding the way get_current_site() does its lookups seems like the most worthwhile portion of this proposal, so I can definitely get behind that.

Because Site.objects.get_current_site() is a model manager method I don't see deprecating it as a necessity, but I also don't mind it for the sake of avoiding confusion.

Adding a Site.objects.get_site_for_id() method seems wholly unnecessary just to avoid writing Site.objects.get(id=FOO) unless it has some other handling built in which you haven't explained.

Under the current system, if contrib.sites is installed but SITE_ID is missing an error should be raised, it shouldn't fall back to doing a Site object lookup based on the request automagically. Making SITE_ID an optional (rather than required) setting for the Sites Framework isn't a wholly trivial change and could easily confuse users.

I'm -1 on adding a default_to_request_site parameter to get_current_site(), though. The entire point of get_current_site() is that it should always return a Site or RequestSite (unless you've misconfigured the Sites Framework). If third-party app authors require that they always get back a Site object, then they should be doing lookups using Site.objects directly. This may, in fact, be an argument in favor of *not* deprecating Site.objects.get_current_site() if this use case is of real concern.

Those are my current thoughts. All in all I'm +0 on this proposal.

I'll leave separate feedback (not directly related to the proposal on this ticket) in the mailing list thread.

comment:6 Changed 3 years ago by legutierr

Thanks for your comments, Gabriel.

You write:

Adding a Site.objects.get_site_for_id() method seems wholly unnecessary just to avoid writing Site.objects.get(id=FOO) unless it has some other handling built in which you haven't explained.

The cacheing logic would be moved from Site.objects.get_current() to Site.objects.get_site_for_id(). This way, the new implementation would be computationally equivalent to the old implementation. The interfaces would be re-factored, but the case where SITE_ID is used to retrieve the site would still have the same underlying logic.

You write:

Under the current system, if contrib.sites is installed but SITE_ID is missing an error should be raised, it shouldn't fall back to doing a Site object lookup based on the request automagically. Making SITE_ID an optional (rather than required) setting for the Sites Framework isn't a wholly trivial change and could easily confuse users

There are just two ways to approach the precedence of SITE_ID vs. the request in determining what site to retrieve. SITE_ID can act as a default, or SITE_ID can act as the override. What you are suggesting, making SITE_ID required, would force it to act as a default. Although this is a viable option, of course, it seems less intuitive to me than causing it to act as an override. Either way, the decision should be about whether it is more intuitive for SITE_ID to be a default/fallback, or for it to override whatever might be in the request. Also, I do not see what would be automagical about using the request to lookup the site, that's the whole point of this ticket.

You write:

I'm -1 on adding a default_to_request_site parameter to get_current_site(), though. The entire point of get_current_site() is that it should always return a Site or RequestSite (unless you've misconfigured the Sites Framework). If third-party app authors require that they always get back a Site object, then they should be doing lookups using Site.objects directly.

Given that among the proposed requirements is the idea that users should be able to provide their own overriding SITE_CALLBACK method, which could return anything at all, it seems futile to narrowly define get_current_site() in terms of what types of objects it returns. Nevertheless, it is important, I think, to enable 3rd party app designers to be able to control whether or not the object they are receiving is an ORM object.

In most cases, the third-party app will want to link to the ORM object either via a ForeignKey, or more likely (given that the specific class cannot be ensured) through a generic relation. In few cases will a RequestSite or some other mock object be acceptable. Direct calls to SiteManager methods will also be problematic, for a number of reasons: if SITE_CALLBACK is defined, the return value may not be a Site object; direct calls to SiteManager methods will circumvent the logic managing the use of SITE_ID vs. the request in retrieving the Site object; a central method is required in order to encourage consistency of approach.

Perhaps the correct name of this parameter is require_orm_object or something like that, instead, given that it should also inform what is returned by SITE_CALLBACK.

comment:7 Changed 3 years ago by jezdez

In case anyone is interested, I've started to put my sites app hacks in a 3rd party app (https://github.com/jezdez/django-sites-tools/). Currently, it only provides a custom middleware (and a view decorator) to add a request.site object and a model manager that can filter objects depending on the current site. I think adding a SITES_DETECTION_BACKEND setting that is used by the middleware would be easy to implement.

comment:8 Changed 3 years ago by jlecker

  • Cc jlecker added

Changed 3 years ago by legutierr

A first version of a patch, for review.

comment:9 Changed 3 years ago by legutierr

The above patch is not complete, by any means. It lacks complete testing, and contains no modifications at all to the documentation. However, I do believe it to reflect the views of the people who have voiced their opinions here and in the original thread.

comment:10 Changed 3 years ago by legutierr

  • Has patch set

comment:11 Changed 3 years ago by legutierr

  • Patch needs improvement set

comment:12 Changed 3 years ago by legutierr

  • Needs documentation set
  • Needs tests set

comment:13 Changed 3 years ago by jezdez

Awesome!

comment:14 Changed 3 years ago by jorgeecardona

Hi, It would be great if user could be isolated between sites, I created a Ticket with some patches that made this, and unittest for it here: http://code.djangoproject.com/ticket/15110 , this woudl help to improve the multitenancy feature in django. I hope it helps.

Regards.

Changed 3 years ago by legutierr

Patch that includes all specified functionality, tests, and documentation changes. Tests for sites and sites_framework pass, but other tests have not been run.

comment:15 Changed 3 years ago by legutierr

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Although additional feedback is obviously needed, I think that this patch provides everything required to move this ticket forward, including test coverage for all the new features, as well as changes to the documentation reflecting the new approach.

comment:16 Changed 3 years ago by hvdklauw

  • Cc hvdklauw added

comment:17 Changed 3 years ago by carljm

  • Keywords blocker removed
  • milestone 1.3 deleted
  • Patch needs improvement set

I'm not sure why this is marked as 1.3 blocker. Much as I'd like to see it added, dynamic multitenancy is a new feature for contrib.sites that belongs in 1.4 at this point. I haven't seen anywhere in this thread a bug identified in a 1.3 feature. It's not a bug that the new get_current_site function doesn't support dynamic multitenancy, as contrib.sites has never supported that, and get_current_site wasn't intended to add support for it (it was intended to add a RequestSite fallback when contrib.sites is not installed, and it already does that just fine). So I'm removing the 1.3 milestone and blocker flag - Russell, please feel free to reverse if there's something obvious I'm missing.

I've posted some more design-decisiony thoughts on the feature and current patch here: http://groups.google.com/group/django-developers/msg/c7e3ee09837851c3

comment:18 Changed 3 years ago by Ciantic

  • Cc Ciantic added

comment:19 follow-up: Changed 3 years ago by linovia

It may sound a bit naive but why would we need the request object to get the site ?
I mean, urlconf management can be linked to a request, but the code stores it within the local thread so that any reverse can be done without explicitly giving the current request.

Wouldn't it make sense to store the site id the same way the urlconfs are handled and defaults it to a DEFAULT_SITE defined in the settings file ?

comment:20 Changed 3 years ago by linovia

  • Cc linovia added

comment:21 in reply to: ↑ 19 Changed 3 years ago by carljm

Replying to linovia:

It may sound a bit naive but why would we need the request object to get the site ?
I mean, urlconf management can be linked to a request, but the code stores it within the local thread so that any reverse can be done without explicitly giving the current request.

Wouldn't it make sense to store the site id the same way the urlconfs are handled and defaults it to a DEFAULT_SITE defined in the settings file ?

Yes, putting the request (or the request host, or a site id picked dynamically in a middleware) into threadlocals would be one approach (in fact, its the method currently chosen by third-party apps implementing dynamic multitenancy for Django), but Django's philosophy has generally been to minimize usage of threadlocals: if you need information from the request, you should pass it in to where it's needed. Thus the effort here to find a threadlocals-free approach if possible.

comment:22 Changed 3 years ago by NicoEchaniz

  • Cc NicoEchaniz added

comment:23 Changed 3 years ago by gabrielhurley

  • Component changed from Contrib apps to contrib.sites

comment:24 Changed 3 years ago by jaddison

  • Severity set to Normal
  • Type set to New feature

comment:25 Changed 3 years ago by dcwatson

  • Cc dcwatson@… added
  • Easy pickings unset

comment:26 Changed 3 years ago by hvdklauw

  • Cc hvdklauw removed
  • UI/UX unset

comment:27 Changed 3 years ago by hvdklauw

  • Cc hvdklauw@… added

comment:28 Changed 2 years ago by tgecho

  • Cc tgecho added

comment:29 Changed 2 years ago by fernando.sanchez@…

  • Cc fernando.sanchez@… added

comment:30 Changed 2 years ago by apollo13

  • Cc florian+django@… added
  • Version 1.3-beta deleted

I would keep get_current_site as is (eg don't perform any magic regarding SITE_ID) and just add a setting to allow users to add their own backend to resolve the current site. Looking up a Site object based on the request is a twoliner then.

Is there anything else which needs to be done to get this ticket rolling again?

comment:31 Changed 23 months ago by apollo13

  • Owner changed from nobody to apollo13

comment:32 Changed 23 months ago by apollo13

comment:33 Changed 22 months ago by krzysiumed

  • Cc krzysiumed@… added

comment:34 Changed 20 months ago by rizumu

  • Cc tom@… added

comment:35 Changed 20 months ago by net147

  • Cc net147 added

comment:36 Changed 19 months ago by anonymous

I like the latest patch as it's lightweight and somehow mitigates the issue, it'll reduce to the minimum the number of hacks required to achieve multitenancy.

It's worth noting that there are a few places in django that use SiteManager.get_current() instead of get_current_site(). A quick scan gave me sitemaps, comment feeds and comment moderation. These are usually from functions that don't have access to the request. I'd say this patch should include some refactoring to completely remove get_current() usage. It should probably be documented that any use of get_current() in third-party apps will break multitenancy, instead of recommending get_current() as it's currently the case, it should be strongly advised against.

Proper multitenancy is still not achieved, even with this patch, because it doesn't address things like MEDIA_ROOT, MEDIA_URL, EMAIL_SUBJECT_PREFIX, DEFAULT_FROM_EMAIL that are needed for most multitenant usages.

For completeness I'll also mention TIME_ZONE, LANGUAGE_CODE and ROOT_URLCONF that are important elements of multitenancy as well, even though it's already possible to affect their value on a per request basis without resorting to terrible hacks.

To solve this issue some people resort to wrapping settings and storing their values in threaded locals, at one point or another it breaks as some code don't like wrapped values. Since django is fundamentally broken when it comes to multitenancy I suggest to people that really need it and that can afford single-thread installations to simply modify settings on the fly. It's frowned upon but other solutions are just as bad.

comment:37 Changed 19 months ago by charettes

  • Cc charette.s@… added

comment:38 Changed 19 months ago by Claude Paroz <claude@…>

In 6c2faaceb0482267cec19da0ff432984028f9d0c:

Made more extensive use of get_current_site

Refs #15089

Changed 19 months ago by claudep

get site from request when no SITE_ID

comment:39 Changed 19 months ago by claudep

I've just uploaded an intermediary patch which make get_current_site always use the request to find out the current site if SITE_ID is not in the settings. Of course, having a way to provide a custom method to retrieve the current site is still useful, but maybe in a second stage?

comment:40 Changed 19 months ago by Claude Paroz <claude@…>

In 1ce4aedcefb68086918adc4137d75a6f2c0bd1f2:

Prevented flatpage view from directly accessing settings.SITE_ID

Refs #15089

comment:41 Changed 18 months ago by nebstrebor

Any chance this ends up in 1.5, or is it even on anyone's radar? Just wondering since we're about to convert our site to multi-tenant architecture and have been holding off a while, hoping django would become more friendly to it.

comment:42 Changed 18 months ago by claudep

Review carefully the latest patch, test it, tell us if it effectively solve issues for you. This will help us to go forward.

comment:43 Changed 18 months ago by nebstrebor

Will do. We'll be starting to implement our change-over to multitenant probably in about 1 month's time, and I will see what the patch does and does not provide that we need and let you know.

comment:44 Changed 13 months ago by aaugustin

  • Triage Stage changed from Design decision needed to Accepted

Dynamic multi-tenancy is a commonly requested feature, no one is against the concept.

comment:45 follow-up: Changed 13 months ago by loic84

If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in BaseHandler .get_response(), just like we do for URLCONF. Indeed, get_current_site() and get_urlconf() are the two sides of the same coin.

There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of Site.objects.get_current() in the codebase (instead of get_current_site()) hint at the problem.

Particularly troublesome if the SITE_ID is only available in the request:

  • Declarative stuff like sitemaps, default values of model fields, etc.
  • Random methods that don't get the request as parameter even though the calling method had access to it.
  • Methods called from templates such as get_absolute_url(). Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.

get_current_site() would need to be swappable just like KEY_FUNCTION for the cache framework, so people can define their own switching logic.

Also it would be nice if the stock get_current_site() had logic for getting the SITE_ID from environ, so that management commands can target a specific site. Before someone mentions that you can make a temporary settings.py that overrides theSITE_ID, I'll say that while it works, it's not exactly practical when you have several hundred sites on a multi-tenant setup.

Finally, it's obvious that the perfect solution would also have to deal with settings, but the site switching problem can be solved without addressing it, so let's not condition this feature to a refactoring of settings that might never happen. Developers would still be on their own when it comes to switching settings per site, but this can be handled case by case by people who need it.

comment:46 Changed 13 months ago by akaariai

I agree that thread-local sites would make sense in many use cases. I think the whole Site model should be swappable. Then, there could be Site.get_current_site() method by which you can alter what is returned from that method. The default implementation returns site by settings.SITE_ID, but if you use a swappable model, then you can alter the implementation as you wish (middleware + thread-locals is a likely approach here...).

comment:47 in reply to: ↑ 45 ; follow-up: Changed 13 months ago by apollo13

Replying to loic84:

If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in BaseHandler .get_response(), just like we do for URLCONF. Indeed, get_current_site() and get_urlconf() are the two sides of the same coin.

Absolutely not. First of making the BaseHandler aware of the current site would imply strong coupling of a contrib app to core -- that's an absolute no-go. I also don't agree with the thread local, there is no need imo to make it one, User for instance works fine without thread locals

There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of Site.objects.get_current() in the codebase (instead of get_current_site()) hint at the problem.

If your API requires a Site, pass in a Site object where needed.

Particularly troublesome if the SITE_ID is only available in the request:

  • Declarative stuff like sitemaps, default values of model fields, etc.

I agree that this is a problem, but I don't think thread-locals are the solution, especially since in the shell you don't have access to the request either.

  • Random methods that don't get the request as parameter even though the calling method had access to it.

That's a problem in the API, not in the sites stuff.

  • Methods called from templates such as get_absolute_url(). Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.

RequestContext?!

get_current_site() would need to be swappable just like KEY_FUNCTION for the cache framework, so people can define their own switching logic.

That was part of my proposal.

Last edited 13 months ago by apollo13 (previous) (diff)

comment:48 in reply to: ↑ 47 Changed 10 months ago by loic84

Replying to apollo13:

Replying to loic84:

If we are serious about having multi-tenancy in core, we need the current site to be made a thread local and a first class citizen in BaseHandler .get_response(), just like we do for URLCONF. Indeed, get_current_site() and get_urlconf() are the two sides of the same coin.

Absolutely not. First of making the BaseHandler aware of the current site would imply strong coupling of a contrib app to core -- that's an absolute no-go. I also don't agree with the thread local, there is no need imo to make it one, User for instance works fine without thread locals

Django completely ignores one of the main part of an URL (the domain), it's a blatant omission in my opinion. Having a hook in place in BaseHandler for people or apps that care about that portion of the URL is not the same as strong coupling a contrib app. contrib.sites could simply consume that hook.

There are plenty of areas of Django that need to be aware of the current site and that aren't reached (most of the time for good reasons) by the request object. The remaining uses of Site.objects.get_current() in the codebase (instead of get_current_site()) hint at the problem.

If your API requires a Site, pass in a Site object where needed.

I'm not talking about my own API, I'm taking about areas of Django over which I have no control.

Particularly troublesome if the SITE_ID is only available in the request:

  • Declarative stuff like sitemaps, default values of model fields, etc.

I agree that this is a problem, but I don't think thread-locals are the solution, especially since in the shell you don't have access to the request either.

Indeed you don't, and that's why I don't want to keep the curent SITE_ID in the request but in a thread local. In the shell you have control over 1. the settings, and 2. the env, so you could write SITE_ID=2 ./manage.py whatever.

  • Random methods that don't get the request as parameter even though the calling method had access to it.

That's a problem in the API, not in the sites stuff.

  • Methods called from templates such as get_absolute_url(). Being able to pass the request as a parameter to such methods is the main reason I've been assessing Jinja2.

RequestContext?!

What do you mean? Right now you are out of luck if you need your Model.get_absolute_url() to be site-aware, for example to return a fully qualified URL when the site for the current request is different than the site to which your model instance belongs. I don't see how RequestContext helps, even if the request is present in the template context, I won't be able to pass it as an argument to get_absolute_url, so get_absolute_url still has no idea about the current site.

comment:49 Changed 8 months ago by Walkman

  • Cc w2lkm2n@… added

comment:50 Changed 3 months ago by simonkagwe

  • Cc simonkagwe added

comment:51 Changed 2 weeks ago by rm_

  • Cc riccardo.magliocchetti@… added

Changed 2 weeks ago by rm_

Forward port the patch to git master

Changed 2 weeks ago by rm_

Avoid the domain, port split already handled by request.get_host(), better commit message

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from apollo13 to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.