Code

Opened 3 years ago

Closed 6 months ago

Last modified 2 months ago

#16855 closed Bug (fixed)

select_related doesn't chain as expected

Reported by: Leo Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: jdunck@…, erik.wognsen@…, seldon, treyhunner, hrehfeld@…, tomek@…, marc.tamlyn@…, mike@…, i.virabyan@…, real.human@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If I'm doing something like:

Foo.objects.select_related('fieldone').select_related('fieldtwo')

I would expect the select related to function on both fieldone and fieldtwo. What actually happens is that the select_related only works on fieldtwo. This is a problem for compositing queries at runtime (for example in search filters) and is a departure from how I would expect queryset methods to work.

If the design decision is to keep it this way, there should be a note in the docs pointing this out.

Attachments (0)

Change History (39)

comment:1 Changed 3 years ago by carljm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Agreed.

comment:2 Changed 3 years ago by carljm

Jeremy Dunck observes in this django-developers thread that in the current implementation the "depth" setting is apparently sticky, while fields are not, and gives a more detailed outline of the desired behavior of depth in combination with multiple calls to select_related.

comment:3 Changed 3 years ago by Leo

Thanks Carl, that approach makes a lot of sense to me and would solve my use case. At the moment I have to go dig at the 'private' data structure that's holding the depth and list of fields to act on in the queryset object to get what I need. It sort of works but is a really fragile hack. If this behavior could get fixed in an upcoming 1.3.x release that would be fantastic.

comment:4 Changed 3 years ago by carljm

Hi Leo - thanks for confirming that the proposed fix would meet your use case! Unfortunately it's quite unlikely that this fix will make it into a 1.3.X release. In our current documented release process, the last stable branch receives only critical bugfixes (security, data-loss, or crashing bugs). This doesn't fit into any of those categories.

This could certainly still make it into 1.4, but that would require someone to put together a patch in the near future.

comment:5 Changed 3 years ago by jdunck

  • Cc jdunck@… added

comment:6 Changed 3 years ago by erw

  • Cc erik.wognsen@… added

I have a problem where using select_related with field arguments in the queryset method of my ModelAdmin doesn't work.

Could this bug also be the cause of that problem because select_related is applied again later by the admin, clearing my field choices?

comment:7 Changed 2 years ago by seldon

  • Cc seldon added

comment:8 Changed 2 years ago by treyhunner

  • Cc treyhunner added

comment:9 Changed 2 years ago by hrehfeld@…

By no means I am an experienced Django user, but this https://gist.github.com/2549147 seems to fix this issue on my single usecase. No further testing done. Can someone verify this?

comment:10 Changed 2 years ago by hrehfeld@…

  • Cc hrehfeld@… added

comment:11 Changed 23 months ago by oinopion

  • Cc tomek@… added

comment:12 Changed 23 months ago by oinopion

  • Has patch set

comment:13 Changed 23 months ago by oinopion

Anssi Kääriäinen left some good comments on my pull request. Now I think there's need to extend this bug to consider following scenario:

# should get all immediate relations AND the Order
Species.objects.select_related(depth=1).select_related('genus__family__order') 

comment:14 Changed 23 months ago by oinopion

  • Patch needs improvement set

comment:15 Changed 23 months ago by akaariai

I am afraid that dealing with the case of select_related(depth=1).select_related('genus__family__order') will be a lot more complicated than the patch you have currently.

I would look into trying to turn the select related descents (in db/models/query.py:get_klass_info, db/models/sql/compiler.py:fill_related_selections) into two phase implementations. In the first phase, turn the depth into lookups, in the second phase just add the lookups, and remove the depth arguments from the current descent implementations. Of course, there are numerous ways forward here. Of course, if you spot some easier way, use it instead.

Version 0, edited 23 months ago by akaariai (next)

comment:16 Changed 23 months ago by mjtamlyn

  • Cc marc.tamlyn@… added

comment:17 Changed 21 months ago by carbonXT

  • Cc mike@… added

comment:18 Changed 18 months ago by ivan_virabyan

  • Cc i.virabyan@… added

comment:19 Changed 18 months ago by lukeplant

One solution might be to deprecate the 'depth' argument to select_related(). I never thought it was a good idea, and I don't think I've ever used it in practice, because it could so easily start doing the wrong thing if you change your schema. It also just seems wrong - if you are using select_related(), you are making specific fine-tunings to the SQL produced. To do that using a broad "just add joins to a depth = n" instruction seems bizarre.

If depth was deprecated, I don't think it would be too bad to have bugs related to it (since we've already got such bugs) until it is finally removed, and then we can avoid fixing the combination.

Also, while I'm here, we should make the API for clearing match prefetch_related() and defer() i.e. pass None as single argument to clear.

Last edited 18 months ago by lukeplant (previous) (diff)

comment:20 Changed 18 months ago by mjtamlyn

+1 to deprecating depth.

comment:21 Changed 18 months ago by ivan_virabyan

+1, never seen anyone using depth argument

comment:22 Changed 18 months ago by akaariai

+1 from me, too. It is dangerous to use. In addition, the select_related code will be a little bit cleaner if depth was removed (not much, but even a little is something...)

comment:23 Changed 18 months ago by mjtamlyn

Proposed deprecation of depth to go into 1.5 release on django-dev list at https://groups.google.com/forum/?fromgroups=#!topic/django-developers/K3Fzqw9RRgg

comment:24 Changed 18 months ago by aaugustin

+1 for deprecating depth too.

comment:25 Changed 18 months ago by mjtamlyn

Given the overwhelmingly positive response from core devs to deprecating depth, I went ahead and made the PR: https://github.com/django/django/pull/488

comment:26 Changed 18 months ago by Preston Holmes <preston@…>

In 965cc0b1ffa27574e5684a18f9400577e5b4598d:

Deprecated depth kwarg on select_related.

This is the start of a deprecation path for the depth kwarg on
select_related. Removing this will allow us to update select_related so
it chains properly and have an API similar to prefetch_related.

Thanks to Marc Tamlyn for spearheading and initial patch.

refs #16855

comment:27 Changed 18 months ago by Preston Holmes <preston@…>

In 0131da06220c1bc7e25df9ea50219e480e5004b3:

[1.5.x] Deprecated depth kwarg on select_related.

This is the start of a deprecation path for the depth kwarg on
select_related. Removing this will allow us to update select_related so
it chains properly and have an API similar to prefetch_related.

Thanks to Marc Tamlyn for spearheading and initial patch.

refs #16855

comment:28 Changed 18 months ago by lukeplant

If we are deprecating the depth argument, we need to remove our own usage of it, or deprecate APIs that use it. The admin uses it:

https://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.list_select_related

I think we should deprecate this entirely, and encourage people instead to implement ModelAdmin.get_queryset in order to add select_related or prefetch_related calls.

We also should fix the admin so that it does the right thing when reacting to list_display - currently it is pretty stupid:

https://github.com/django/django/blob/79dd751b0b9e428bf386d7304f442fa9815fdbba/django/contrib/admin/views/main.py#L343

We can either make it more intelligent, or remove this behaviour entirely, but the latter is more likely to give performance regressions.

comment:29 Changed 18 months ago by mjtamlyn

I agree it would be nice to tidy up the use of select_related in the admin - though we do need to make sure it remains intelligent, I know a lot of my sites depend on that performance gain.

To clarify though, as far as I can tell none of Django is broken by the removal of the depth kwarg - it's still acceptable at present (if inadvisable) to pass no arguments to select_related() and have Django auto-follow all of the FKs (i.e. depth=infinity). To remove that as well is more likely to be a major backwards incompatibility.

This does beg the question as to what the correct API should be for chaining and clearing - is it ok to have select_related() and select_related(None) do different things? (former would add all automatically, latter would clear list.)

comment:30 Changed 18 months ago by lukeplant

Ah - I was actually thinking we should only have select_related(*fields), and remove the behaviour where no arguments means 'select everything'. I think that has the same problems as selecting to an arbitrary depth. I was actually thinking that the current implementation worked by having 'depth=5' as the default, so select_related() was the same as select_related(depth=5), and the former would be automatically deprecated along with the latter.

In fact, the depth = 5 idea is implemented at a lower level - in django/db/models/sql/query.py. So it still doesn't do infinite traversal AFAICS.

comment:31 Changed 18 months ago by akaariai

How about changing list_select_related to accept a list of selections. That way you could pretty easily define the needed select related stuff.

...
    list_display = 'my_fk_field'
    list_select_related = ('my_fk_field', 'my_fk_field__more')

Raise deprecation warning for True in there.

get_queryset() override works, too, though the admin seems to prefer the declarative attribute style...

Intelligent select_related is somewhat hard to do correctly, although that depends on the definition of what intelligent select_related means...

I am in favour of getting rid of unlimited or depth=5 select_related() calls altogether.

comment:32 Changed 18 months ago by mjtamlyn

"Intelligent" select_related in the admin should just be a case of ensuring all the columns on the list view which are foreign keys are passed into a select_related call. I haven't looked at the practicalities of this, but I know that the statements needed are the same as those needed for ordering by that column, so we should be able to reuse some code (and the admin_order_field property for methods which create columns).

I like the new version of list_select_related.

There are a few uses of a blank select_related in the auth code, and a stack of them in the tests but these should just be a case of working out which fields need to change and updating them.

I've posted on the dev group again to clarify this change in deprecation - to me it's a bigger deal that just removing depth now and I'm probably only +0 on it.

comment:33 Changed 17 months ago by charettes

Implementing such an intelligent detection would be hard; think of the list_display callbacks that may access foreign objects.

# models.py
class City(models.Model):
   name = models.CharField(max_length=30)

   def __unicode__(self):
       return self.name

class Street(models.Model):
    name = models.CharField(max_length=30)
    city = models.ForeignKey(City)

   def __unicode__(self):
       return "%s, %s" % (self.name, self.city)

# admin.py
class StreetAdmin(admin.ModelAdmin):
    list_display = ('__unicode__', 'display_city',)
    list_select_related = True

    def display_city(self, obj):
        city = obj.city
        return '<a href="%s">%s</a>' % (city.get_absolute_url(), city)
    display_city.allow_tags = True

If we replace the list_select_related = True behavior to an automatic detection we should introduce another property (inspired by admin_order_field) on those callbacks, say admin_list_select_related, to expose which field the intelligent mechanism should select. Otherwise people will have to override get_queryset to achieve what was automatically done by specifying list_select_related = True prior to this change.

FWIW I prefer the list_select_related change proposed by @akaariai (list of selection) since it's way more declarative and less error prone.

comment:34 Changed 11 months ago by oinopion

Described list_select_related change was committed as fix for #19080 (9ed971f4), though it does not deprecate the Boolean variant.

comment:35 Changed 11 months ago by mjtamlyn

I was thinking about this when I committed the fix for #19080 - I think it should be reasonable (and a good compromise solution to this issue) to have effectively two reset-mechanisms on this function. With depth gone after 1.6, there will be three ways of calling select_related:

select_related('foo', 'bar') is the preferred usage, where you specify the fields. Subsequent calls of this form will append to each other.
select_related() removes list and replaces with "all" (actually roughly the same as original depth=5).
select_related(None) removes list completely.

I think this will be pretty consistent with the existing behaviour (given multiple calls with names are pretty weird now), and also more consistent with other similar functions (like prefetch_related).

comment:36 Changed 6 months ago by timo

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

PR from Marc.

comment:37 Changed 6 months ago by Marc Tamlyn <marc.tamlyn@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 349c12d3f5c015c2f8b36917da21230c2ac1acb4:

Fixed #16855 -- select_related() chains as expected.

select_related('foo').select_related('bar') is now equivalent to
select_related('foo', 'bar').

Also reworded docs to recommend select_related(*fields) over select_related()

comment:38 Changed 2 months ago by mrmachine

Please do not further reduce the functionality of select_related() by removing the no-arguments variant (select all related, with a high hard coded depth limit to prevent recursive joins).

I use select_related() with no arguments all the time, as I have found that in almost all my use cases, following too many joins and returning more data than may be necessary is still much more performant than returning too little data, and subsequently having template authors generate additional queries by accessing related fields that were not selected in the view.

It also saves the view programmer from tediously listing larger numbers of related fields when they legitimately want to select most or all related fields, and avoids the need for view programmers to know in advance which related fields a template author will want to access.

I much rather fallback to naming explicit fields in select_related() only when there is an actual performance problem because of too many joins, instead of premature optimisation by trying to explicitly list all relations, and having to re-visit that code when a template or model changes.

comment:39 Changed 2 months ago by mrmachine

  • Cc real.human@… added

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.