Opened 6 years ago

Closed 2 years ago

#10733 closed Bug (fixed)

Invalid results when deferring fields in more than one related model with only()

Reported by: mrts Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: efficient-admin
Cc: carljm, ruosteinen, walter+django@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Given the same models as in #10710, the following works as expected:

>>> results = C.objects.all().only('name', 'a', 'b').select_related()
>>> results[0].a.name
u'a2'
>>> results[0].b.name
u'b1'

, but the following does not pull in the second related model field (b.name):

>>> results = C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related()
>>> results[0].a.name
u'a2'
>>> results[0].b.name
''

Attachments (2)

deferred_related_fields_test.diff (4.1 KB) - added by adurdin 6 years ago.
Test to replicate the bug
deferred_related_fields.diff (522 bytes) - added by adurdin 6 years ago.
Initial patch

Download all attachments as: .zip

Change History (32)

comment:1 Changed 6 years ago by mrts

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Looks like b.id is actually None in that case:

>>> results = C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related()
>>> results[0].b.id is None
True

Another odd side-effect -- when accessing the deferred fields, bogus values are returned (as in #10710, notice how lots_of_text returns the id value):

>>> results = C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related()
>>> results[0].a.lots_of_text
2
>>> results[0].a.id
2
>>> results[0].a.name
u'a2'

Running on

>>> django.get_version()
u'1.1 beta 1 SVN-10395'

comment:2 Changed 6 years ago by mrts

  • Component changed from Uncategorized to Database layer (models, ORM)

comment:3 Changed 6 years ago by mrts

  • milestone set to 1.1

comment:4 Changed 6 years ago by Alex

I looked at this for a while and didn't make any huge progress, but I think I found 2 issues(maybe :/):

  1. The SQL for Leaf.objects.only('child__name', 'second_child__name').select_related() is only doing 1 join, and I think it needs to be doing 2.
  1. It seems that neither of those child models is actually a deferred model there.

comment:5 Changed 6 years ago by mrts

  • Keywords efficient-admin added

comment:6 Changed 6 years ago by jacob

  • Triage Stage changed from Unreviewed to Accepted

comment:7 Changed 6 years ago by adurdin

  • Owner changed from nobody to adurdin
  • Status changed from new to assigned

Changed 6 years ago by adurdin

Test to replicate the bug

comment:8 Changed 6 years ago by adurdin

I've looked into this a little; the problem is not in the sql generation, as the expected columns and joins are all present.

I've tracked the problem down to somewhere in QuerySet.iterator(). In the test case, the model instantiation when selecting the single result is:

DeferredRelatedC_Deferred_is_published_lots_of_text(a_id=1, b_id=2, id=3, name=u'c3')
DeferredRelatedA(1, u'a1', 2, u'b2')
DeferredRelatedB()

That is, the kwargs are not being set properly when instantiating the two related models.

Changed 6 years ago by adurdin

Initial patch

comment:9 Changed 6 years ago by adurdin

  • Has patch set
  • Patch needs improvement set

The fault is in django/models/query.py:get_cached_row(): when it recurses it was not passing the only_load dict down the stack. The attached patch fixes this.

However I noticed that the recursion is not passing anything for the offset parameter, which is to do with aggregation queries. I can't say for sure at the moment, but it looks to me like an additional bug relating to aggregated queries with related models could be hiding in here as well. Or is the offset parameter only relevant to the top-level model?

comment:10 Changed 6 years ago by adurdin

  • Owner changed from adurdin to nobody
  • Status changed from assigned to new

comment:11 Changed 6 years ago by russellm

  • Resolution set to fixed
  • Status changed from new to closed

(In [10928]) Fixed #10733 -- Added a regression test for queries with multiple references to multiple foreign keys in only() clauses. Thanks to mrts for the report.

comment:12 Changed 6 years ago by russellm

I suspect this may have been corrected as a result of the fixes for #10572 in [10926]. I've added a regression test case to make sure, but if I'm missing something here, please reopen.

comment:13 Changed 6 years ago by mrts

Indeed, this has been corrected by [10926]. Thanks!

comment:14 Changed 6 years ago by mrts

  • Resolution fixed deleted
  • Status changed from closed to reopened

Alas, given the above models, the following happens:

>>> from only_breakage.models import C
>>> from django.db import connection
>>> import copy

Expected

All three tables joined:

>>> C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related().query.as_sql()
('SELECT "only_breakage_c"."id", "only_breakage_c"."name",
"only_breakage_c"."a_id", "only_breakage_c"."b_id",
"only_breakage_a"."id", "only_breakage_a"."name",
"only_breakage_b"."id", "only_breakage_b"."name"
FROM "only_breakage_c"
INNER JOIN "only_breakage_a" ON ("only_breakage_c"."a_id" = "only_breakage_a"."id")
INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id")', ())

A single query fetches all what's needed:

>>> results = C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related()
>>> connection.queries
[]
>>> results[0].a.name
u'a2'
>>> connection.queries
[{'sql': u'SELECT "only_breakage_c"."id", "only_breakage_c"."name",
"only_breakage_c"."a_id", "only_breakage_c"."b_id",
"only_breakage_a"."id", "only_breakage_a"."name",
"only_breakage_b"."id", "only_breakage_b"."name"
FROM "only_breakage_c"
INNER JOIN "only_breakage_a" ON ("only_breakage_c"."a_id" = "only_breakage_a"."id")
INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id")
LIMIT 1',
  'time': '0.002'},
>>> queries = copy.deepcopy(connection.queries)
>>> results[0].b.name
u'b1'
>>> assert connection.queries == queries

Actually got

Only two tables joined (the A table is discarded):

>>> C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related().query.as_sql()
('SELECT "only_breakage_c"."id", "only_breakage_c"."name",
"only_breakage_c"."a_id", "only_breakage_c"."b_id",
"only_breakage_b"."id", "only_breakage_b"."name"
FROM "only_breakage_c"
INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id")', ())

Three queries happen:

>>> results = C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related()
>>> connection.queries
[]
>>> results[0].a.name
u'a2'
>>> connection.queries
[{'sql': u'SELECT "only_breakage_c"."id", "only_breakage_c"."name",
"only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_b"."id", "only_breakage_b"."name"
FROM "only_breakage_c" INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id") LIMIT 1',
  'time': '0.002'},
 {'sql': u'SELECT "only_breakage_a"."id", "only_breakage_a"."name",
"only_breakage_a"."lots_of_text", "only_breakage_a"."a_field" FROM "only_breakage_a" WHERE "only_breakage_a"."id" = 2 ',
  'time': '0.000'}]
>>> results[0].b.name
u'b1'
>>> connection.queries
[{'sql': u'SELECT "only_breakage_c"."id", "only_breakage_c"."name",
"only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_b"."id", "only_breakage_b"."name"
FROM "only_breakage_c" INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id") LIMIT 1',
  'time': '0.002'},
 {'sql': u'SELECT "only_breakage_a"."id", "only_breakage_a"."name",
"only_breakage_a"."lots_of_text", "only_breakage_a"."a_field" FROM "only_breakage_a" WHERE "only_breakage_a"."id" = 2 ',
  'time': '0.000'},
 {'sql': u'SELECT "only_breakage_c"."id", "only_breakage_c"."name",
"only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_b"."id", "only_breakage_b"."name"
FROM "only_breakage_c" INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id") LIMIT 1',
  'time': '0.000'}]

comment:15 Changed 6 years ago by russellm

  • milestone changed from 1.1 to 1.2

Ok - there are actually three problems here:

  1. C.objects.all().select_related() does one join, not two.
  2. The aname clause in only() doesn't roll out to the right field inclusion list in the queryset
  3. The internal only_load data structure can't differentiate between aname and bname.

Point 1 should probably be it's own ticket (it strikes me that it probably already is, but a quick search didn't reveal an obvious candidate). The workaround is to explicitly specify select_related('a','b').

Points 2 and 3 are closely related, but they aren't simple fixes.

I'm going to punt this for v1.1. None of these problems cause data loss or incorrect answers, and I can't see any reason that these problems can't be fixed while preserving backwards compatibility in the only() interface. I will grant that these problems are inconvenient and lead to inefficiencies, but if we're sticking to a time-based release schedule, we're going to have to live with the fact that releases are going to have some bugs.

That said, if someone is particularly affected by this problem and is able to work up a patch before v1.1 I'm happy to look at getting it into trunk.

comment:16 Changed 6 years ago by mrts

Right you are, select_related('a', 'b').query.as_sql() looks correct:

>>> C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related('a', 'b').query.as_sql() 
('SELECT "only_breakage_c"."id", "only_breakage_c"."name", "only_breakage_c"."a_id", "only_breakage_c"."b_id",
"only_breakage_a"."id", "only_breakage_a"."name",
"only_breakage_b"."id", "only_breakage_b"."name"
FROM "only_breakage_c"
LEFT OUTER JOIN "only_breakage_a" ON ("only_breakage_c"."a_id" = "only_breakage_a"."id")
INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id")',
 ())

Doesn't affect the behaviour though, problems 1-3 remain.

comment:17 follow-up: Changed 6 years ago by ccahoon

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [10996]) Fixed #10733 -- Added a regression test for queries with multiple references to multiple foreign keys in only() clauses. Thanks to mrts for the report.

comment:18 in reply to: ↑ 17 ; follow-up: Changed 6 years ago by mrts

  • Resolution fixed deleted
  • Status changed from closed to reopened

Replying to ccahoon:

As Russell said, this is not fixed. Reopening.

comment:19 in reply to: ↑ 18 Changed 6 years ago by anonymous

Replying to mrts:

Replying to ccahoon:

As Russell said, this is not fixed. Reopening.

mrts -- Sorry about that, I had a major committing SNAFU in June and didn't see that the post-commit track script had messed this ticket up.

comment:20 Changed 6 years ago by drdaeman

Shouldn't at least a simple patch, correctly passing only_load in db/models/query.py:get_cached_row() be included into trunk?

This won't solve all problems, but will allow simple cases to work properly which is certainly better than nothing.

comment:21 Changed 6 years ago by carljm

  • Cc carljm added

comment:22 Changed 6 years ago by ruosteinen

  • Cc ruosteinen added

So, what is the status here? Are we going to see this in 1.2?

comment:23 Changed 5 years ago by russellm

  • milestone changed from 1.2 to 1.3

Not critical for 1.2

comment:24 Changed 5 years ago by wdoekes

  • Cc walter+django@… added

comment:25 Changed 4 years ago by SmileyChris

  • Severity set to Normal
  • Type set to Bug

comment:26 Changed 4 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

comment:11 Changed 4 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:12 Changed 4 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:13 Changed 2 years ago by aaugustin

  • Status changed from reopened to new

comment:14 Changed 2 years ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 8d4b1629d4434b8b30078d064e74689ca8906f37:

Fixed #10733 -- select_related().only() failure

The bug was reported pre 1.1, and somewhere along the way it has been
fixed. So, this is tests only addition.

Note: See TracTickets for help on using tickets.
Back to Top