Opened 13 years ago
Closed 12 years ago
#17485 closed Bug (fixed)
Queries with both deferred fields and select_related defer field.name instead of field.attname
Reported by: | Michal Petrucha | Owned by: | Anssi Kääriäinen |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Release blocker | Keywords: | defer select_related |
Cc: | niwi@…, real.human@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When deferring ForeignKey
fields in conjunction with select_related
, Django creates a DeferredAttribute
for the field's name
instead of its attname
. This results in its ReverseSingleRelatedObjectDescriptor
being overriden and thus stripping the model instance of most of this field's functionality.
Consider the following models (taken from regressiontests.queries):
class Celebrity(models.Model): name = models.CharField("Name", max_length=20) greatest_fan = models.ForeignKey("Fan", null=True, unique=True) def __unicode__(self): return self.name class Fan(models.Model): fan_of = models.ForeignKey(Celebrity)
Let's play around with it in the shell for a bit:
>>> Celebrity.objects.create(name="Joe") <Celebrity: Joe> >>> obj = Celebrity.objects.all().defer('greatest_fan').select_related('greatest_fan').get() >>> print obj.__class__.__dict__ { '__module__': 'subclassfktest.models', '_meta': <Options for Celebrity_Deferred_greatest_fan>, 'objects': <django.db.models.manager.ManagerDescriptor object at 0x1460810>, 'MultipleObjectsReturned': <class 'subclassfktest.models.MultipleObjectsReturned'>, '_base_manager': <django.db.models.manager.Manager object at 0x1460490>, 'greatest_fan': <django.db.models.query_utils.DeferredAttribute object at 0x1457750>, 'DoesNotExist': <class 'subclassfktest.models.DoesNotExist'>, '__doc__': 'Celebrity_Deferred_greatest_fan(id, name, greatest_fan_id)', '_default_manager': <django.db.models.manager.Manager object at 0x1460ad0>, '_deferred': True }
We can see that the deferred atribute is at greatest_fan
instead of greatest_fan_id
and the ReverseSingleRelatedObjectDescriptor
is not present at all.
The following is just to show that the model instance is really broken because of this:
>>> print obj.greatest_fan None >>> f = Fan.objects.create(fan_of=obj) >>> obj.greatest_fan = f >>> obj.save() >>> obj2 = Celebrity.objects.get() >>> f.id 1 >>> print obj2.greatest_fan_id # Should be 1 None
An interesting thing, though, is that the instance gives access to the right related object instance even without a working ReverseSingleRelatedObjectDescriptor
:
>>> obj2.greatest_fan = f >>> obj2.save() >>> obj = Celebrity.objects.all().defer('greatest_fan').select_related('greatest_fan').get() >>> obj.greatest_fan.id 1 >>> obj.__dict__ {'_greatest_fan_cache': <Fan: Fan object>, 'name': u'Joe', 'greatest_fan_id': None, '_state': <django.db.models.base.ModelState object at 0x146b250>, 'greatest_fan': <Fan: Fan object>, 'id': 1}
Anyway, the _id
attribute is still never set to the correct value.
The fix for this behavior is a one-liner, see attachment. It makes sure to defer the field's attname
even in this case (in all other cases where deferred_class_factory
is called, attnames
are used). There are side effects, though.
Actually, this bug was hiding another one for which even tests exist but because of this one they pass.
The first test failure is test_basic
in regressiontests.defer_regress.tests.DeferRegressionTest
which is quite obvious, it lists the incorrect model name caused by this bug and is fixed easily (since it is actually an incorrect test).
The other one, though, is not that simple: test_defer
in modeltests.defer.tests.DeferTests
. More precisely the following two lines:
# DOES THIS WORK? self.assert_delayed(qs.only("name").select_related("related")[0], 1) self.assert_delayed(qs.defer("related").select_related("related")[0], 0)
where related
is a ForeignKey.
These have been around for a while and the only reason they passed until now is that assert_delayed
checks the attname
of each field whether it is deferred or not. In this case, however, the attname
is obviously not deferred, since the name
is in its stead.
The question is, what should we do with these two tests? Do we want to fix them in one go with this bug or file a new ticket and temporarily comment out the two failing test lines?
Attachments (3)
Change History (18)
by , 13 years ago
Attachment: | defer_select_related.patch added |
---|
follow-up: 2 comment:1 by , 13 years ago
Triage Stage: | Unreviewed → Accepted |
---|
You've clearly demonstrated that this is a bug: Django misbehaves badly in this situation.
It seems to me that select_related
is the opposite of defer
. Is there a use case for using both on the same field? Maybe it's .defer('foo').select_related('foo__bar__baz')
, in order to fetch foo
, bar
and baz
in one query on the first access to foo
? (Does that actually work?)
Anyway, I think we should fix the bug and all affected tests with one patch.
comment:2 by , 13 years ago
Owner: | changed from | to
---|
Replying to aaugustin:
It seems to me that
select_related
is the opposite ofdefer
. Is there a use case for using both on the same field? Maybe it's.defer('foo').select_related('foo__bar__baz')
, in order to fetchfoo
,bar
andbaz
in one query on the first access tofoo
? (Does that actually work?)
Nope, that does not work nor do I think there's an easy way to make it, at least with the current implementation of deferred fields. The whole example I've given is based on the two failing test cases, I can't imagine any sane use case at the moment.
I think the test was meant more like something to make sure the ORM doesn't omit a field on which it makes a join, i. e. issuing a select_related
on a deferred field will make the field immediate.
I've come up with a way to fix this, I'll try to implement it later today.
follow-up: 4 comment:3 by , 13 years ago
So shouldn't we forbid using defer and select_related on the same field entirely, by raising an exception?
It's already broken, so that change wouldn't be too much backwards incompatible.
comment:4 by , 13 years ago
Replying to aaugustin:
So shouldn't we forbid using defer and select_related on the same field entirely, by raising an exception?
The problem is, what should happen if you issue a select_related
with a depth and then defer a ForeignKey
field? Should it raise an exception as well?
Other than that, I think both routes are about the same in terms of complexity of solution, either way you have to detect the affected ForeignKey
s and react accordingly.
comment:5 by , 13 years ago
Okay, took me a little longer but I have an almost working solution to this with the following behavior: an explicit defer
combined with select_related
on the same field results in an InvalidQuery
; deferring a ForeignKey
and using a depth-based select_related
results in the specific relationship being skipped.
Now the problem is, what should happen when someone uses only
and select_related
in a way that the ForeignKey
would be omitted? Do we want to issue the exception as well or do we want to automatically add the field to the set of loaded ones? Currently it behaves the first way, though it might make sense in the second way as well.
It is used in regressiontests.select_related_regress.SelectRelatedRegressTests.test_regression_12851
this way at the moment – should I go ahead and update the test or do we want to keep the test case and add more magic?
Personally I'm in favor of the first behavior, i. e. raising an exception, as it is more explicit. Stating I only want to load other fields and then asking for a select_related
on an omitted field seems contradictory to me but there may be those who disagree.
by , 13 years ago
Attachment: | defer_select_related-throws_exception.patch added |
---|
Patch containing fixes for both issues along with tests and docs
comment:6 by , 12 years ago
Cc: | added |
---|
Here is the original patch, updated to the latest version of django. (https://github.com/niwibe/django/tree/ticket_17485)
I do not know the status of this ticket, but the solution seems reasonable.
comment:7 by , 12 years ago
I took a look at the patch and it seems good to me. At first I was going to suggest that select_related() should just override .only() and .defer() for the related fields, but thinking it more carefully it is more explicit to disallow this case, and even more importantly there are already two votes for erroring out in these cases. The sad thing is there is a minor backwards incompatibility for the .only() case (right?), but I guess that problem could be dismissed as being a bug fix.
I would like to add an assert to the deferred class factory which prevents the creation of deferred classes using field.name instead of field.attname.
It seems there is some need to refactor, or at least comment the select_related implementation in models/sql/compiler.py and then in models/query.py. Both the query.py and compiler.py code must produce same columns and column positions for the select, so there is one possibility for cleanup. In addition, the compiler.py join logic is very hard to understand currently, and at least some comments in there is needed.
I tried to do some of the refactoring for this ticket, but soon realised the minor changes cascade to all around the select_related code. So, for this ticket the above refactoring should be skipped.
Before moving forward here, any thought on the backwards incompatibility issue? This would be disallowed by the new code:
qs.only('name').select_related('related')
it would raise an error, and instead you would need to do:
qs.only('name', 'related').select_related('related')
I doubt there are many users who would see this issue...
comment:8 by , 12 years ago
Actually, I'm not entirely convinced by the patch myself, mainly for reasons you already mentioned – there is some code duplication in place which should ideally be refactored. At the moment, however, I don't feel like attempting to do this kind of refactor, maybe sometime after composite fields are finished...
As for the backwards compatibility issue, personally I'd prefer being more explicit. I think the first line is a contradiction – first it says not to load the field related
and then it asks for a join on this field. I don't think this makes much sense and there are two ways of interpreting this, we can either ignore the request to defer the field or the request to make a join. Either way the line is ambiguous and for me it makes sense to require the user to be consistent.
I'm going to take care of my github branch and will make a pull request later...
comment:9 by , 12 years ago
The thing is, your fix is pretty straightforward. Yes, there is room for larger refactoring, but that refactoring will take a lot of time. It would be nice to just fix this one issue, so this would not block solving other issues. Then later on it would be possible to tackle the select_related total-refactor (TM).
This is why I think we should go with your patch. It isn't perfect, but solves the immediate issue at hand.
comment:10 by , 12 years ago
Took me longer to update my repo than I expected, sorry about that.
Two updates to the patch:
- I bumped the version number in the docs.
- The original patch breaks the test case introduced with a fix to #17876. This test case seems to be broken as it expects the related field to be pulled in.
Pull request: https://github.com/django/django/pull/105
comment:11 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:12 by , 12 years ago
Cc: | added |
---|---|
Resolution: | fixed |
Status: | closed → reopened |
Looks like there's a bug in this commit that was not caught by the included tests. When I try to use only()
and select_related()
and pass the same fields to both, spanning multiple FKs I get an erroneous exception telling me that I can't use select_related
for a deferred field. That's not what I'm doing. I'm using select_related
for a field that is explicitly NOT deferred.
I have written a patch with a new test, but I don't have a fix because I don't fully understand what the code in this commit is doing.
comment:13 by , 12 years ago
Owner: | changed from | to
---|---|
Severity: | Normal → Release blocker |
Status: | reopened → new |
Seems like I have some fixing to do...
comment:14 by , 12 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
A fix for this is available from: https://github.com/akaariai/django/tree/ticket_17485_fix
The problem was that for deeper nestings of select_related + only the only() loaded fields were taken for self.query.model instead of the model the field was located in. This managed to work correctly when self.query.model was the same model as the field was in.
comment:15 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fix for the first bug along with tests for it.