Opened 5 years ago

Last modified 20 months ago

#30124 assigned Bug

QuerySet.only() on reverse many-to-one relationship causes extra database queries

Reported by: Beda Kosata Owned by: Dan Davis
Component: Database layer (models, ORM) Version: 2.1
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using the following example models:

class WorkSet(models.Model):
    name = models.TextField(blank=True)

class Work(models.Model):
    name = models.TextField(blank=True)
    work_set = models.ForeignKey(WorkSet, on_delete=models.CASCADE, related_name='works')
    extra_data = JSONField(default=dict, blank=True)

when I iterate over the Works related to a WorkSet ws like this:

for work in ws.works.all().only('name'):
    print(work.name)

Not only one query is made to the database, but 1+N where N is the number of Works related to the WorkSet. In connection.queries these queries look like this

'SELECT "test_work"."id", "test_work"."work_set_id" FROM "test_work" WHERE "test_work"."id" = 677931'

etc. That is - the ORM refetches the work_set_id for each of the Works iterated over. If I change the loop to explicitly include work_set_id , only one query is run:

for work in ws.works.all().only('name', 'work_set_id'):
    print(work.name)

While this behavior makes some sense, I find it quite unexpected and a potential cause of nasty performance 'leaks'. It think that either the documentation should explicitly mention this case and how to work around it or better the ORM should automatically add the relationship key into the fetched fields.

Please let me know if anything is unclear about this issue and if there is anything I can improve.

Change History (6)

comment:1 by Tim Graham, 5 years ago

Triage Stage: UnreviewedAccepted

I'm not sure if QuerySet.only() is meant to work with reverse relationships (see #23051 for another issue) but as I said there, a helpful message should be raised if the decision is not to support it.

comment:2 by Dan Davis, 5 years ago

Owner: changed from nobody to Dan Davis
Status: newassigned

I'd like some feedback of whether it is worth fixing this rather than documenting the issue. My inclination is to fix it - the query object has an alias_map for the join at the moment that only is called, and so whether that's the best way to detect the reverse relationship or not, there is certainly some way:

qs1 = ws.works.get_queryset()
qs1.query.alias_map['queries_workset']      # I put them in tests/queries/models.py for a start

I'm not sure quite how to select the reverse relationship, here is the join.dict:

{'table_name': 'queries_workset',
 'parent_alias': 'queries_work',
 'table_alias': 'queries_workset',
 'join_type': 'INNER JOIN',
 'join_cols': (('work_set_id', 'id'),),
 'join_field': <django.db.models.fields.related.ForeignKey: work_set>,
 'nullable': False,
 'filtered_relation': None}

Least interesting part for me - I've validated this by writing a failing test case. I'm counting queries using @override_settings(DEBUG=True), and I'm not sure that's a good implementation strategy. But anyway, I'm more interested in understanding how Django structures joins now that I've gotten so good at annotate as a user ;)

comment:3 by Dan Davis, 5 years ago

I'd guess the key is in understanding how the alias_map will look under different circumstances. When the WorkSet is the base table, and I examine this:

WorkSet.objects.filter(works__name='Flying Car').query.alias_map['queries_work'].__dict__

I see this:

{'table_name': 'queries_work',
 'parent_alias': 'queries_workset',
 'table_alias': 'queries_work',
 'join_type': 'INNER JOIN',
 'join_cols': (('id', 'work_set_id'),),
 'join_field': <ManyToOneRel: queries.wolrk>,
 'nullable': True,
 'filtered_relation': None}

The other problem is that even if beginner Django Dev me can fix this particular problem, getting the treatment of only/defer in the presence of joins completely consistent will be a bit of a challenge. I know annotations will not excluded by only() or defer().

comment:4 by Dan Davis, 5 years ago

There seem to be two general approaches here:

only is intended for fields of the base table

So, if the alias_map has cardinality > 1, then detect fields that are not from the base table and raise a warning. Since only is an optimization, the warning alerts the user to the problem, but his/her webapp simply continues to work, albeit slower. In a reverse relationship, the parent's key is a field of the base table - add it automatically. This doesn't necessarily mean that the pk of the parent table will be added. If the ForeignKey is on something other than the parent pk, then the parent pk is not what is added to the response. This would fix this issue, #30124, and in the case of #23051, a warning would be raised and the call to only would not modify the query. Since only is an optimization, this may be acceptable.

only is not intended for joins

So, if the table_map has cardinality > 1, then raise some warning so that we can gain feedback from users, and if reasonable, deprecate at a later time. The code in #30124 would have to be done with defer() or values(), which could in a real-world case be a lot of fields and a great loss of utility. #23051 would similarly cause a warning, but here the warning might be better received ;).

Are there implications of this on other query types, such as union/difference, etc.?

comment:5 by Dan Davis, 5 years ago

New idea - in the case of a select_related(), then the natural expectation is that only will also apply to fields in the related model. To finish this, I will need to understand better how the select tuple in django.db.models.sql.query.Query is populated, and when. I know it is deferred to the end. I will check.

comment:6 by Simon Charette, 20 months ago

Having spent a bit of time in refactoring only and defer for #21204 (PR) I think the proper solution is likely not to introspect aliases and try to make the best of it but to add a new attribute to Query that denotes fields that must always be present when the select mask is generated.

We already do so for all primary keys of models involved in select_related/only or defer combinations so if this attribute was present and the get_queryset method of reverse manager set it as queryset.query.select_mask = {rel} and then a copy of this value could be used in _get_defer_select_mask and _get_only_select_mask.

Note: See TracTickets for help on using tickets.
Back to Top