Opened 12 years ago

Closed 12 years ago

#17485 closed Bug (fixed)

Queries with both deferred fields and select_related defer field.name instead of field.attname

Reported by: Michal Petrucha Owned by: Anssi Kääriäinen
Component: Database layer (models, ORM) Version: dev
Severity: Release blocker Keywords: defer select_related
Cc: niwi@…, real.human@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When deferring ForeignKey fields in conjunction with select_related, Django creates a DeferredAttribute for the field's name instead of its attname. This results in its ReverseSingleRelatedObjectDescriptor being overriden and thus stripping the model instance of most of this field's functionality.

Consider the following models (taken from regressiontests.queries):

class Celebrity(models.Model):
    name = models.CharField("Name", max_length=20)
    greatest_fan = models.ForeignKey("Fan", null=True, unique=True)

    def __unicode__(self):
        return self.name

class Fan(models.Model):
    fan_of = models.ForeignKey(Celebrity)

Let's play around with it in the shell for a bit:

>>> Celebrity.objects.create(name="Joe")
<Celebrity: Joe>
>>> obj = Celebrity.objects.all().defer('greatest_fan').select_related('greatest_fan').get()
>>> print obj.__class__.__dict__
{
  '__module__': 'subclassfktest.models',
  '_meta': <Options for Celebrity_Deferred_greatest_fan>,
  'objects': <django.db.models.manager.ManagerDescriptor object at 0x1460810>,
  'MultipleObjectsReturned': <class 'subclassfktest.models.MultipleObjectsReturned'>,
  '_base_manager': <django.db.models.manager.Manager object at 0x1460490>,
  'greatest_fan': <django.db.models.query_utils.DeferredAttribute object at 0x1457750>,
  'DoesNotExist': <class 'subclassfktest.models.DoesNotExist'>,
  '__doc__': 'Celebrity_Deferred_greatest_fan(id, name, greatest_fan_id)',
  '_default_manager': <django.db.models.manager.Manager object at 0x1460ad0>,
  '_deferred': True
}

We can see that the deferred atribute is at greatest_fan instead of greatest_fan_id and the ReverseSingleRelatedObjectDescriptor is not present at all.

The following is just to show that the model instance is really broken because of this:

>>> print obj.greatest_fan
None
>>> f = Fan.objects.create(fan_of=obj)
>>> obj.greatest_fan = f
>>> obj.save()
>>> obj2 = Celebrity.objects.get()
>>> f.id
1
>>> print obj2.greatest_fan_id        # Should be 1
None

An interesting thing, though, is that the instance gives access to the right related object instance even without a working ReverseSingleRelatedObjectDescriptor:

>>> obj2.greatest_fan = f
>>> obj2.save()
>>> obj = Celebrity.objects.all().defer('greatest_fan').select_related('greatest_fan').get()
>>> obj.greatest_fan.id
1
>>> obj.__dict__
{'_greatest_fan_cache': <Fan: Fan object>, 'name': u'Joe', 'greatest_fan_id': None, '_state': <django.db.models.base.ModelState object at 0x146b250>, 'greatest_fan': <Fan: Fan object>, 'id': 1}

Anyway, the _id attribute is still never set to the correct value.

The fix for this behavior is a one-liner, see attachment. It makes sure to defer the field's attname even in this case (in all other cases where deferred_class_factory is called, attnames are used). There are side effects, though.

Actually, this bug was hiding another one for which even tests exist but because of this one they pass.

The first test failure is test_basic in regressiontests.defer_regress.tests.DeferRegressionTest which is quite obvious, it lists the incorrect model name caused by this bug and is fixed easily (since it is actually an incorrect test).

The other one, though, is not that simple: test_defer in modeltests.defer.tests.DeferTests. More precisely the following two lines:

        # DOES THIS WORK?
        self.assert_delayed(qs.only("name").select_related("related")[0], 1)
        self.assert_delayed(qs.defer("related").select_related("related")[0], 0)

where related is a ForeignKey.

These have been around for a while and the only reason they passed until now is that assert_delayed checks the attname of each field whether it is deferred or not. In this case, however, the attname is obviously not deferred, since the name is in its stead.

The question is, what should we do with these two tests? Do we want to fix them in one go with this bug or file a new ticket and temporarily comment out the two failing test lines?

Attachments (3)

defer_select_related.patch (2.2 KB ) - added by Michal Petrucha 12 years ago.
Fix for the first bug along with tests for it.
defer_select_related-throws_exception.patch (12.7 KB ) - added by Michal Petrucha 12 years ago.
Patch containing fixes for both issues along with tests and docs
17485-only-select_related.diff (2.3 KB ) - added by Tai Lee 12 years ago.
failing test case

Download all attachments as: .zip

Change History (18)

by Michal Petrucha, 12 years ago

Attachment: defer_select_related.patch added

Fix for the first bug along with tests for it.

comment:1 by Aymeric Augustin, 12 years ago

Triage Stage: UnreviewedAccepted

You've clearly demonstrated that this is a bug: Django misbehaves badly in this situation.

It seems to me that select_related is the opposite of defer. Is there a use case for using both on the same field? Maybe it's .defer('foo').select_related('foo__bar__baz'), in order to fetch foo, bar and baz in one query on the first access to foo? (Does that actually work?)

Anyway, I think we should fix the bug and all affected tests with one patch.

in reply to:  1 comment:2 by Michal Petrucha, 12 years ago

Owner: changed from nobody to Michal Petrucha

Replying to aaugustin:

It seems to me that select_related is the opposite of defer. Is there a use case for using both on the same field? Maybe it's .defer('foo').select_related('foo__bar__baz'), in order to fetch foo, bar and baz in one query on the first access to foo? (Does that actually work?)

Nope, that does not work nor do I think there's an easy way to make it, at least with the current implementation of deferred fields. The whole example I've given is based on the two failing test cases, I can't imagine any sane use case at the moment.

I think the test was meant more like something to make sure the ORM doesn't omit a field on which it makes a join, i. e. issuing a select_related on a deferred field will make the field immediate.

I've come up with a way to fix this, I'll try to implement it later today.

comment:3 by Aymeric Augustin, 12 years ago

So shouldn't we forbid using defer and select_related on the same field entirely, by raising an exception?

It's already broken, so that change wouldn't be too much backwards incompatible.

in reply to:  3 comment:4 by Michal Petrucha, 12 years ago

Replying to aaugustin:

So shouldn't we forbid using defer and select_related on the same field entirely, by raising an exception?

The problem is, what should happen if you issue a select_related with a depth and then defer a ForeignKey field? Should it raise an exception as well?

Other than that, I think both routes are about the same in terms of complexity of solution, either way you have to detect the affected ForeignKeys and react accordingly.

comment:5 by Michal Petrucha, 12 years ago

Okay, took me a little longer but I have an almost working solution to this with the following behavior: an explicit defer combined with select_related on the same field results in an InvalidQuery; deferring a ForeignKey and using a depth-based select_related results in the specific relationship being skipped.

Now the problem is, what should happen when someone uses only and select_related in a way that the ForeignKey would be omitted? Do we want to issue the exception as well or do we want to automatically add the field to the set of loaded ones? Currently it behaves the first way, though it might make sense in the second way as well.

It is used in regressiontests.select_related_regress.SelectRelatedRegressTests.test_regression_12851 this way at the moment – should I go ahead and update the test or do we want to keep the test case and add more magic?

Personally I'm in favor of the first behavior, i. e. raising an exception, as it is more explicit. Stating I only want to load other fields and then asking for a select_related on an omitted field seems contradictory to me but there may be those who disagree.

by Michal Petrucha, 12 years ago

Patch containing fixes for both issues along with tests and docs

comment:6 by Andrei Antoukh, 12 years ago

Cc: niwi@… added

Here is the original patch, updated to the latest version of django. (https://github.com/niwibe/django/tree/ticket_17485)

I do not know the status of this ticket, but the solution seems reasonable.

comment:7 by Anssi Kääriäinen, 12 years ago

I took a look at the patch and it seems good to me. At first I was going to suggest that select_related() should just override .only() and .defer() for the related fields, but thinking it more carefully it is more explicit to disallow this case, and even more importantly there are already two votes for erroring out in these cases. The sad thing is there is a minor backwards incompatibility for the .only() case (right?), but I guess that problem could be dismissed as being a bug fix.

I would like to add an assert to the deferred class factory which prevents the creation of deferred classes using field.name instead of field.attname.

It seems there is some need to refactor, or at least comment the select_related implementation in models/sql/compiler.py and then in models/query.py. Both the query.py and compiler.py code must produce same columns and column positions for the select, so there is one possibility for cleanup. In addition, the compiler.py join logic is very hard to understand currently, and at least some comments in there is needed.

I tried to do some of the refactoring for this ticket, but soon realised the minor changes cascade to all around the select_related code. So, for this ticket the above refactoring should be skipped.

Before moving forward here, any thought on the backwards incompatibility issue? This would be disallowed by the new code:

qs.only('name').select_related('related')

it would raise an error, and instead you would need to do:

qs.only('name', 'related').select_related('related')

I doubt there are many users who would see this issue...

comment:8 by Michal Petrucha, 12 years ago

Actually, I'm not entirely convinced by the patch myself, mainly for reasons you already mentioned – there is some code duplication in place which should ideally be refactored. At the moment, however, I don't feel like attempting to do this kind of refactor, maybe sometime after composite fields are finished...

As for the backwards compatibility issue, personally I'd prefer being more explicit. I think the first line is a contradiction – first it says not to load the field related and then it asks for a join on this field. I don't think this makes much sense and there are two ways of interpreting this, we can either ignore the request to defer the field or the request to make a join. Either way the line is ambiguous and for me it makes sense to require the user to be consistent.

I'm going to take care of my github branch and will make a pull request later...

comment:9 by Anssi Kääriäinen, 12 years ago

The thing is, your fix is pretty straightforward. Yes, there is room for larger refactoring, but that refactoring will take a lot of time. It would be nice to just fix this one issue, so this would not block solving other issues. Then later on it would be possible to tackle the select_related total-refactor (TM).

This is why I think we should go with your patch. It isn't perfect, but solves the immediate issue at hand.

comment:10 by Michal Petrucha, 12 years ago

Took me longer to update my repo than I expected, sorry about that.

Two updates to the patch:

  1. I bumped the version number in the docs.
  2. The original patch breaks the test case introduced with a fix to #17876. This test case seems to be broken as it expects the related field to be pulled in.

Pull request: https://github.com/django/django/pull/105

comment:11 by Anssi Kääriäinen <akaariai@…>, 12 years ago

Resolution: fixed
Status: newclosed

In [b6c356b7bb97f3d6d4831b31e67868313bbbc090]:

Fixed #17485 -- Made defer work with select_related

This commit tackles a couple of issues. First, in certain cases there
were some mixups if field.attname or field.name should be deferred.
Field.attname is now always used.

Another issue tackled is a case where field is both deferred by
.only(), and selected by select_related. This case is now an error.

A lot of thanks to koniiiik (Michal Petrucha) for the patch, and
to Andrei Antoukh for review.

comment:12 by Tai Lee, 12 years ago

Cc: real.human@… added
Resolution: fixed
Status: closedreopened

Looks like there's a bug in this commit that was not caught by the included tests. When I try to use only() and select_related() and pass the same fields to both, spanning multiple FKs I get an erroneous exception telling me that I can't use select_related for a deferred field. That's not what I'm doing. I'm using select_related for a field that is explicitly NOT deferred.

I have written a patch with a new test, but I don't have a fix because I don't fully understand what the code in this commit is doing.

by Tai Lee, 12 years ago

failing test case

comment:13 by Anssi Kääriäinen, 12 years ago

Owner: changed from Michal Petrucha to Anssi Kääriäinen
Severity: NormalRelease blocker
Status: reopenednew

Seems like I have some fixing to do...

comment:14 by Anssi Kääriäinen, 12 years ago

Triage Stage: AcceptedReady for checkin

A fix for this is available from: https://github.com/akaariai/django/tree/ticket_17485_fix

The problem was that for deeper nestings of select_related + only the only() loaded fields were taken for self.query.model instead of the model the field was located in. This managed to work correctly when self.query.model was the same model as the field was in.

comment:15 by Anssi Kääriäinen <akaariai@…>, 12 years ago

Resolution: fixed
Status: newclosed

In f399a804c9436a5d3ef9e2b73c337904af2c753d:

Fixed #17485 regression -- only + select_related interaction

When doing deeper than one level select_related() + only queries(), the
code introduced in b6c356b7bb97f3d6d4831b31e67868313bbbc090 errored
incorrectly.

Thanks to mrmachine for report & test case.

Note: See TracTickets for help on using tickets.
Back to Top