Opened 2 years ago

Last modified 8 hours ago

#29214 new Bug

Invalid SQL generated when annotating a subquery with an outerref to an annotated field.

Reported by: Oskar Persson Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: queryset annotations
Cc: felixxm, Simon Charette, Chetan Khanna Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

An SQL-error is raised when updating a queryset which includes an annotation created using a subquery, which in turn also uses an annotation.

class Recursive(models.Model):
    name = models.CharField(max_length=10)
    link = models.IntegerField()
    parent = models.ForeignKey('self', models.CASCADE, null=True)

from django.db.models import OuterRef, Subquery, F

rec1 = Recursive.objects.create(name="r1", link=1)
rec2 = Recursive.objects.create(name="r2", link=1)
rec3 = Recursive.objects.create(name="r11", parent=rec1, link=2)

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .update(parent_id=F('p'))
Traceback (most recent call last):
  File "tests.py", line 88, in test_update_annotated_using_related_queryset
    .update(parent_id=F('p'))
  File "/git/django/django/db/models/query.py", line 647, in update
    rows = query.get_compiler(self.db).execute_sql(CURSOR)
  File "/git/django/django/db/models/sql/compiler.py", line 1204, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/git/django/django/db/models/sql/compiler.py", line 899, in execute_sql
    raise original_exception
OperationalError: no such column: T2.link

I've tried this in SQLite and MySQL in Django 1.11.11 and Django 2.0 using Python 3.6.0. All raising similar errors.

Change History (12)

comment:1 Changed 2 years ago by Tim Graham

Description: modified (diff)
Summary: SQL-error when updating after annotating using subqueryInvalid SQL when updating after annotating using subquery
Triage Stage: UnreviewedAccepted

comment:2 Changed 20 months ago by Carlton Gibson

Related: #29542

comment:3 Changed 20 months ago by felixxm

Cc: felixxm added

comment:4 Changed 19 months ago by Oskar Persson

Just tried this with the fix for #29542 and the problem remains.

comment:5 Changed 15 months ago by Simon Charette

Summary: Invalid SQL when updating after annotating using subqueryInvalid SQL generated when annotating a subquery with an outerref to an annotated field.

#30009 was a duplicate that wasn't relying on update().

comment:6 Changed 15 months ago by Simon Charette

As mentioned in #30009 this has something to do with alias quoting and is probably related to Subquery.resolve_expression's alteration of external_alias as a comment mentions it's done to prevent quoting.

comment:7 Changed 5 days ago by Oskar Persson

Version: 1.11

comment:8 Changed 5 days ago by Oskar Persson

Version: master

comment:9 in reply to:  5 Changed 10 hours ago by Chetan Khanna

Replying to Simon Charette:

#30009 was a duplicate that wasn't relying on update().

I don't think this issue is strictly related to annotate. Possibly, its the update() clause. If we remove the update() clause, the query runs just fine.

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .values()

which produces the following SQL:

SELECT "recursive_model"."id",
       "recursive_model"."name",
       "recursive_model"."link",
       "recursive_model"."parent_id",
       T2."link" AS "parent_link",
       (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" = T2."link"
         ORDER BY U0."id" ASC
         LIMIT 1
       ) AS "p"
  FROM "recursive_model"
 INNER JOIN "recursive_model" T2
    ON ("recursive_model"."parent_id" = T2."id")
 WHERE "recursive_model"."parent_id" IS NOT NULL
 LIMIT 21

Also the query on https://code.djangoproject.com/ticket/30009 seems to run just fine on current master. (I had to tweak it a bit since the otherwise the ORM complained about multiple columns being returned)

query:

Task.objects.annotate(                          
    top_case_id=Coalesce(F('case__parent_case__parent_case_id'), F('case__parent_case_id'), F('case_id')),
    subject=Subquery(Subject.objects.filter(case_id=OuterRef('top_case_id')).values('id'), output_field=IntegerField())
 ).all()

corresponding SQL:

SELECT "ticket29214_task"."id",
       "ticket29214_task"."num",
       "ticket29214_task"."case_id",
       COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id", "ticket29214_task"."case_id") AS "top_case_id",
       (
        SELECT U0."id"
          FROM "ticket29214_subject" U0
         WHERE U0."case_id" = COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id", "ticket29214_task"."case_id")
       ) AS "subject"
  FROM "ticket29214_task"
  LEFT OUTER JOIN "ticket29214_case"
    ON ("ticket29214_task"."case_id" = "ticket29214_case"."id")
  LEFT OUTER JOIN "ticket29214_case" T3
    ON ("ticket29214_case"."parent_case_id" = T3."id")
 LIMIT 21

(No extra quoting aroung "T3" as mentioned in the description)

Database used was PostgreSQL and SQLs are picked from shell_plus of django_extensions.

comment:10 Changed 9 hours ago by Simon Charette

Cc: Simon Charette added

Chetan did you reproduce the update() crash against latest master as well? There has been a few tweaks to subquery aliases quoting in the past weeks.

Could you provide the generated problematic UPDATE SQL since it's missing from the ticket?

comment:11 Changed 9 hours ago by Chetan Khanna

Cc: Chetan Khanna added

Yes. On running the same query, I get the following SQL and traceback:

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .update(parent_id=F('p'))

SQL:

UPDATE "recursive_model"
   SET "parent_id" = (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" = T2."link"
         ORDER BY U0."id" ASC
         LIMIT 1
       )
 WHERE "recursive_model"."id" IN (
        SELECT V0."id"
          FROM "recursive_model" V0
         INNER JOIN "recursive_model" V1
            ON (V0."parent_id" = V1."id")
         WHERE V0."parent_id" IS NOT NULL
       )

Traceback:

ProgrammingError: missing FROM-clause entry for table "t2"
LINE 1: ...."id" FROM "recursive_model" U0 WHERE U0."link" = T2."link" ...

Unfortunately I haven't looked deeper, but I am planning to during this weekend. I'll report if I get something else.

comment:12 Changed 8 hours ago by Simon Charette

Thanks!

Something suspicious here is the usage of F('parent__link') which should be disallowed per https://code.djangoproject.com/ticket/14104#comment:1. We don't support JOINs in UPDATE unless I'm mistaken so I suspect the JOIN'ed T2 table is simply get pruned by SQLUpdateCompiler.

The only way to add support for such query would be to make OuterRef('parent__link') result in a JOIN within the subquery, notice how the filter(parent__isnull=False) predicate results in a subquery instead of an INNER JOIN like it does when .update is not used in comment:9.

I assume we'd want do something similar for OuterRef that include a __ so the resulting query is

latest_parent = Recursive.objects.filter(link=OuterRef('parent__link')).order_by('id')
Recursive.objects.filter(
    parent__isnull=False
).update(parent_id=Subquery(latest_parent.values('pk')[:1])
UPDATE "recursive_model"
   SET "parent_id" = (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" IN (
             SELECT U1."link"
             FROM "recursive_model" U1
             WHERE U1."id" = "recursive_model"."parent_id"
         )
         ORDER BY U0."id" ASC
         LIMIT 1
       )
 WHERE "recursive_model"."id" IN (
        SELECT V0."id"
          FROM "recursive_model" V0
         INNER JOIN "recursive_model" V1
            ON (V0."parent_id" = V1."id")
         WHERE V0."parent_id" IS NOT NULL
       )
Note: See TracTickets for help on using tickets.
Back to Top