Opened 2 years ago

Last modified 5 months ago

#29214 assigned Bug

Invalid SQL generated when annotating a subquery with an outerref to an annotated field.

Reported by: Oskar Persson Owned by: Chetan Khanna
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: queryset annotations
Cc: felixxm, Simon Charette, Chetan Khanna Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

An SQL-error is raised when updating a queryset which includes an annotation created using a subquery, which in turn also uses an annotation.

class Recursive(models.Model):
    name = models.CharField(max_length=10)
    link = models.IntegerField()
    parent = models.ForeignKey('self', models.CASCADE, null=True)

from django.db.models import OuterRef, Subquery, F

rec1 = Recursive.objects.create(name="r1", link=1)
rec2 = Recursive.objects.create(name="r2", link=1)
rec3 = Recursive.objects.create(name="r11", parent=rec1, link=2)

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .update(parent_id=F('p'))
Traceback (most recent call last):
  File "tests.py", line 88, in test_update_annotated_using_related_queryset
    .update(parent_id=F('p'))
  File "/git/django/django/db/models/query.py", line 647, in update
    rows = query.get_compiler(self.db).execute_sql(CURSOR)
  File "/git/django/django/db/models/sql/compiler.py", line 1204, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/git/django/django/db/models/sql/compiler.py", line 899, in execute_sql
    raise original_exception
OperationalError: no such column: T2.link

I've tried this in SQLite and MySQL in Django 1.11.11 and Django 2.0 using Python 3.6.0. All raising similar errors.

Attachments (1)

ticket29214.diff (1.5 KB) - added by Chetan Khanna 5 months ago.
Test file

Download all attachments as: .zip

Change History (18)

comment:1 Changed 2 years ago by Tim Graham

Description: modified (diff)
Summary: SQL-error when updating after annotating using subqueryInvalid SQL when updating after annotating using subquery
Triage Stage: UnreviewedAccepted

comment:2 Changed 2 years ago by Carlton Gibson

Related: #29542

comment:3 Changed 2 years ago by felixxm

Cc: felixxm added

comment:4 Changed 2 years ago by Oskar Persson

Just tried this with the fix for #29542 and the problem remains.

comment:5 Changed 20 months ago by Simon Charette

Summary: Invalid SQL when updating after annotating using subqueryInvalid SQL generated when annotating a subquery with an outerref to an annotated field.

#30009 was a duplicate that wasn't relying on update().

comment:6 Changed 20 months ago by Simon Charette

As mentioned in #30009 this has something to do with alias quoting and is probably related to Subquery.resolve_expression's alteration of external_alias as a comment mentions it's done to prevent quoting.

comment:7 Changed 6 months ago by Oskar Persson

Version: 1.11

comment:8 Changed 6 months ago by Oskar Persson

Version: master

comment:9 in reply to:  5 Changed 5 months ago by Chetan Khanna

Replying to Simon Charette:

#30009 was a duplicate that wasn't relying on update().

I don't think this issue is strictly related to annotate. Possibly, its the update() clause. If we remove the update() clause, the query runs just fine.

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .values()

which produces the following SQL:

SELECT "recursive_model"."id",
       "recursive_model"."name",
       "recursive_model"."link",
       "recursive_model"."parent_id",
       T2."link" AS "parent_link",
       (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" = T2."link"
         ORDER BY U0."id" ASC
         LIMIT 1
       ) AS "p"
  FROM "recursive_model"
 INNER JOIN "recursive_model" T2
    ON ("recursive_model"."parent_id" = T2."id")
 WHERE "recursive_model"."parent_id" IS NOT NULL
 LIMIT 21

Also the query on https://code.djangoproject.com/ticket/30009 seems to run just fine on current master. (I had to tweak it a bit since the otherwise the ORM complained about multiple columns being returned)

query:

Task.objects.annotate(                          
    top_case_id=Coalesce(F('case__parent_case__parent_case_id'), F('case__parent_case_id'), F('case_id')),
    subject=Subquery(Subject.objects.filter(case_id=OuterRef('top_case_id')).values('id'), output_field=IntegerField())
 ).all()

corresponding SQL:

SELECT "ticket29214_task"."id",
       "ticket29214_task"."num",
       "ticket29214_task"."case_id",
       COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id", "ticket29214_task"."case_id") AS "top_case_id",
       (
        SELECT U0."id"
          FROM "ticket29214_subject" U0
         WHERE U0."case_id" = COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id", "ticket29214_task"."case_id")
       ) AS "subject"
  FROM "ticket29214_task"
  LEFT OUTER JOIN "ticket29214_case"
    ON ("ticket29214_task"."case_id" = "ticket29214_case"."id")
  LEFT OUTER JOIN "ticket29214_case" T3
    ON ("ticket29214_case"."parent_case_id" = T3."id")
 LIMIT 21

(No extra quoting aroung "T3" as mentioned in the description)

Database used was PostgreSQL and SQLs are picked from shell_plus of django_extensions.

comment:10 Changed 5 months ago by Simon Charette

Cc: Simon Charette added

Chetan did you reproduce the update() crash against latest master as well? There has been a few tweaks to subquery aliases quoting in the past weeks.

Could you provide the generated problematic UPDATE SQL since it's missing from the ticket?

comment:11 Changed 5 months ago by Chetan Khanna

Cc: Chetan Khanna added

Yes. On running the same query, I get the following SQL and traceback:

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .update(parent_id=F('p'))

SQL:

UPDATE "recursive_model"
   SET "parent_id" = (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" = T2."link"
         ORDER BY U0."id" ASC
         LIMIT 1
       )
 WHERE "recursive_model"."id" IN (
        SELECT V0."id"
          FROM "recursive_model" V0
         INNER JOIN "recursive_model" V1
            ON (V0."parent_id" = V1."id")
         WHERE V0."parent_id" IS NOT NULL
       )

Traceback:

ProgrammingError: missing FROM-clause entry for table "t2"
LINE 1: ...."id" FROM "recursive_model" U0 WHERE U0."link" = T2."link" ...

Unfortunately I haven't looked deeper, but I am planning to during this weekend. I'll report if I get something else.

comment:12 Changed 5 months ago by Simon Charette

Thanks!

Something suspicious here is the usage of F('parent__link') which should be disallowed per https://code.djangoproject.com/ticket/14104#comment:1. We don't support JOINs in UPDATE unless I'm mistaken so I suspect the JOIN'ed T2 table is simply get pruned by SQLUpdateCompiler.

The only way to add support for such query would be to make OuterRef('parent__link') result in a JOIN within the subquery, notice how the filter(parent__isnull=False) predicate results in a subquery instead of an INNER JOIN like it does when .update is not used in comment:9.

I assume we'd want do something similar for OuterRef that include a __ so the resulting query is

latest_parent = Recursive.objects.filter(link=OuterRef('parent__link')).order_by('id')
Recursive.objects.filter(
    parent__isnull=False
).update(parent_id=Subquery(latest_parent.values('pk')[:1])
UPDATE "recursive_model"
   SET "parent_id" = (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" IN (
             SELECT U1."link"
             FROM "recursive_model" U1
             WHERE U1."id" = "recursive_model"."parent_id"
         )
         ORDER BY U0."id" ASC
         LIMIT 1
       )
 WHERE "recursive_model"."id" IN (
        SELECT V0."id"
          FROM "recursive_model" V0
         INNER JOIN "recursive_model" V1
            ON (V0."parent_id" = V1."id")
         WHERE V0."parent_id" IS NOT NULL
       )

comment:13 Changed 5 months ago by Chetan Khanna

That comment was really insightful, thank you! :)

If you don't mind confirming, then this is what I could get from the comment:
Whenever there is a filter() clause, preceding an update() clause, and containing fields from related models joined via __, the ORM gives back a SQL with an INNER JOIN inside a subquery. And for this issue, we want to have a similar functionality in the OuterRef that contain __.

If so, I think I have created a test from our test suite that might serve as a starting point.

Changed 5 months ago by Chetan Khanna

Attachment: ticket29214.diff added

Test file

comment:14 Changed 5 months ago by Simon Charette

Chetan, almost correct. The query will not necessarily involve generating an INNER JOIN but it happens to do so in this case.

Most of the logic lives in SQLUpdateCompiler but the idea is that if the backend supports it the whole filtering part of the queryset will be pushed down to a subquery, JOINs included, and filtered against using pk IN.

This will be way trickier to implement for subqueries and I'm starting to wonder if we didn't get it wrong in the first place by making OuteRef('outerqs__lookup') result in a new JOIN in the outer query instead of encapsulating it in the subquery. The way we do it right now breaks aggregation in subtle ways #29214 and happens to break update as well as we came to discover.

If the JOIN was always encapsulated in the subquery in the first place it would solve this ticket, #29214 and I suspect it would make implementing #28296 way easier.

comment:15 Changed 5 months ago by Chetan Khanna

Has patch: set
Owner: changed from nobody to Chetan Khanna
Status: newassigned

Yes, it sure is tricky!

https://github.com/django/django/pull/12546

I have prepared a patch that should work for the update case at least. And sorry for prolonging this, I had my university exams in the middle and the implementation didn't come to me at once but I still wanted to give it a shot. I've tried to follow whatever was said in https://code.djangoproject.com/ticket/29214#comment:12 as closely as possible. Awaiting review :)

comment:16 Changed 5 months ago by felixxm

Patch needs improvement: set

comment:17 Changed 5 months ago by Chetan Khanna

Sorry, I forgot that. I got occupied with my GSoC application since the submissions have already started.

Note: See TracTickets for help on using tickets.
Back to Top