Opened 7 years ago

Last modified 9 months ago

#29214 new Bug

Invalid SQL generated when annotating a subquery with an outerref to an annotated field.

Reported by: Oskar Persson Owned by:
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: queryset annotations
Cc: Mariusz Felisiak, Simon Charette, Chetan Khanna Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

An SQL-error is raised when updating a queryset which includes an annotation created using a subquery, which in turn also uses an annotation.

class Recursive(models.Model):
    name = models.CharField(max_length=10)
    link = models.IntegerField()
    parent = models.ForeignKey('self', models.CASCADE, null=True)

from django.db.models import OuterRef, Subquery, F

rec1 = Recursive.objects.create(name="r1", link=1)
rec2 = Recursive.objects.create(name="r2", link=1)
rec3 = Recursive.objects.create(name="r11", parent=rec1, link=2)

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .update(parent_id=F('p'))
Traceback (most recent call last):
  File "tests.py", line 88, in test_update_annotated_using_related_queryset
    .update(parent_id=F('p'))
  File "/git/django/django/db/models/query.py", line 647, in update
    rows = query.get_compiler(self.db).execute_sql(CURSOR)
  File "/git/django/django/db/models/sql/compiler.py", line 1204, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/git/django/django/db/models/sql/compiler.py", line 899, in execute_sql
    raise original_exception
OperationalError: no such column: T2.link

I've tried this in SQLite and MySQL in Django 1.11.11 and Django 2.0 using Python 3.6.0. All raising similar errors.

Attachments (1)

ticket29214.diff (1.5 KB ) - added by Chetan Khanna 5 years ago.
Test file

Download all attachments as: .zip

Change History (25)

comment:1 by Tim Graham, 7 years ago

Description: modified (diff)
Summary: SQL-error when updating after annotating using subqueryInvalid SQL when updating after annotating using subquery
Triage Stage: UnreviewedAccepted

comment:2 by Carlton Gibson, 6 years ago

Related: #29542

comment:3 by Mariusz Felisiak, 6 years ago

Cc: Mariusz Felisiak added

comment:4 by Oskar Persson, 6 years ago

Just tried this with the fix for #29542 and the problem remains.

comment:5 by Simon Charette, 6 years ago

Summary: Invalid SQL when updating after annotating using subqueryInvalid SQL generated when annotating a subquery with an outerref to an annotated field.

#30009 was a duplicate that wasn't relying on update().

comment:6 by Simon Charette, 6 years ago

As mentioned in #30009 this has something to do with alias quoting and is probably related to Subquery.resolve_expression's alteration of external_alias as a comment mentions it's done to prevent quoting.

comment:7 by Oskar Persson, 5 years ago

Version: 1.11

comment:8 by Oskar Persson, 5 years ago

Version: master

in reply to:  5 comment:9 by Chetan Khanna, 5 years ago

Replying to Simon Charette:

#30009 was a duplicate that wasn't relying on update().

I don't think this issue is strictly related to annotate. Possibly, its the update() clause. If we remove the update() clause, the query runs just fine.

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .values()

which produces the following SQL:

SELECT "recursive_model"."id",
       "recursive_model"."name",
       "recursive_model"."link",
       "recursive_model"."parent_id",
       T2."link" AS "parent_link",
       (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" = T2."link"
         ORDER BY U0."id" ASC
         LIMIT 1
       ) AS "p"
  FROM "recursive_model"
 INNER JOIN "recursive_model" T2
    ON ("recursive_model"."parent_id" = T2."id")
 WHERE "recursive_model"."parent_id" IS NOT NULL
 LIMIT 21

Also the query on https://code.djangoproject.com/ticket/30009 seems to run just fine on current master. (I had to tweak it a bit since the otherwise the ORM complained about multiple columns being returned)

query:

Task.objects.annotate(                          
    top_case_id=Coalesce(F('case__parent_case__parent_case_id'), F('case__parent_case_id'), F('case_id')),
    subject=Subquery(Subject.objects.filter(case_id=OuterRef('top_case_id')).values('id'), output_field=IntegerField())
 ).all()

corresponding SQL:

SELECT "ticket29214_task"."id",
       "ticket29214_task"."num",
       "ticket29214_task"."case_id",
       COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id", "ticket29214_task"."case_id") AS "top_case_id",
       (
        SELECT U0."id"
          FROM "ticket29214_subject" U0
         WHERE U0."case_id" = COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id", "ticket29214_task"."case_id")
       ) AS "subject"
  FROM "ticket29214_task"
  LEFT OUTER JOIN "ticket29214_case"
    ON ("ticket29214_task"."case_id" = "ticket29214_case"."id")
  LEFT OUTER JOIN "ticket29214_case" T3
    ON ("ticket29214_case"."parent_case_id" = T3."id")
 LIMIT 21

(No extra quoting aroung "T3" as mentioned in the description)

Database used was PostgreSQL and SQLs are picked from shell_plus of django_extensions.

comment:10 by Simon Charette, 5 years ago

Cc: Simon Charette added

Chetan did you reproduce the update() crash against latest master as well? There has been a few tweaks to subquery aliases quoting in the past weeks.

Could you provide the generated problematic UPDATE SQL since it's missing from the ticket?

comment:11 by Chetan Khanna, 5 years ago

Cc: Chetan Khanna added

Yes. On running the same query, I get the following SQL and traceback:

latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')

Recursive.objects.filter(parent__isnull=False) \
        .annotate(parent_link=F('parent__link')) \
        .annotate(p=Subquery(latest_parent.values('pk')[:1])) \
        .update(parent_id=F('p'))

SQL:

UPDATE "recursive_model"
   SET "parent_id" = (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" = T2."link"
         ORDER BY U0."id" ASC
         LIMIT 1
       )
 WHERE "recursive_model"."id" IN (
        SELECT V0."id"
          FROM "recursive_model" V0
         INNER JOIN "recursive_model" V1
            ON (V0."parent_id" = V1."id")
         WHERE V0."parent_id" IS NOT NULL
       )

Traceback:

ProgrammingError: missing FROM-clause entry for table "t2"
LINE 1: ...."id" FROM "recursive_model" U0 WHERE U0."link" = T2."link" ...

Unfortunately I haven't looked deeper, but I am planning to during this weekend. I'll report if I get something else.

comment:12 by Simon Charette, 5 years ago

Thanks!

Something suspicious here is the usage of F('parent__link') which should be disallowed per https://code.djangoproject.com/ticket/14104#comment:1. We don't support JOINs in UPDATE unless I'm mistaken so I suspect the JOIN'ed T2 table is simply get pruned by SQLUpdateCompiler.

The only way to add support for such query would be to make OuterRef('parent__link') result in a JOIN within the subquery, notice how the filter(parent__isnull=False) predicate results in a subquery instead of an INNER JOIN like it does when .update is not used in comment:9.

I assume we'd want do something similar for OuterRef that include a __ so the resulting query is

latest_parent = Recursive.objects.filter(link=OuterRef('parent__link')).order_by('id')
Recursive.objects.filter(
    parent__isnull=False
).update(parent_id=Subquery(latest_parent.values('pk')[:1])
UPDATE "recursive_model"
   SET "parent_id" = (
        SELECT U0."id"
          FROM "recursive_model" U0
         WHERE U0."link" IN (
             SELECT U1."link"
             FROM "recursive_model" U1
             WHERE U1."id" = "recursive_model"."parent_id"
         )
         ORDER BY U0."id" ASC
         LIMIT 1
       )
 WHERE "recursive_model"."id" IN (
        SELECT V0."id"
          FROM "recursive_model" V0
         INNER JOIN "recursive_model" V1
            ON (V0."parent_id" = V1."id")
         WHERE V0."parent_id" IS NOT NULL
       )

comment:13 by Chetan Khanna, 5 years ago

That comment was really insightful, thank you! :)

If you don't mind confirming, then this is what I could get from the comment:
Whenever there is a filter() clause, preceding an update() clause, and containing fields from related models joined via __, the ORM gives back a SQL with an INNER JOIN inside a subquery. And for this issue, we want to have a similar functionality in the OuterRef that contain __.

If so, I think I have created a test from our test suite that might serve as a starting point.

by Chetan Khanna, 5 years ago

Attachment: ticket29214.diff added

Test file

comment:14 by Simon Charette, 5 years ago

Chetan, almost correct. The query will not necessarily involve generating an INNER JOIN but it happens to do so in this case.

Most of the logic lives in SQLUpdateCompiler but the idea is that if the backend supports it the whole filtering part of the queryset will be pushed down to a subquery, JOINs included, and filtered against using pk IN.

This will be way trickier to implement for subqueries and I'm starting to wonder if we didn't get it wrong in the first place by making OuteRef('outerqs__lookup') result in a new JOIN in the outer query instead of encapsulating it in the subquery. The way we do it right now breaks aggregation in subtle ways #29214 and happens to break update as well as we came to discover.

If the JOIN was always encapsulated in the subquery in the first place it would solve this ticket, #29214 and I suspect it would make implementing #28296 way easier.

comment:15 by Chetan Khanna, 5 years ago

Has patch: set
Owner: changed from nobody to Chetan Khanna
Status: newassigned

Yes, it sure is tricky!

https://github.com/django/django/pull/12546

I have prepared a patch that should work for the update case at least. And sorry for prolonging this, I had my university exams in the middle and the implementation didn't come to me at once but I still wanted to give it a shot. I've tried to follow whatever was said in https://code.djangoproject.com/ticket/29214#comment:12 as closely as possible. Awaiting review :)

comment:16 by Mariusz Felisiak, 5 years ago

Patch needs improvement: set

comment:17 by Chetan Khanna, 5 years ago

Sorry, I forgot that. I got occupied with my GSoC application since the submissions have already started.

comment:18 by Mariusz Felisiak, 3 years ago

#32839 is a duplicate with FilteredRelation annotations.

comment:19 by Bálint Balina, 3 years ago

Is there a way I can help with this issue? I have just reported #32839 and I am eager to resolve the problem, but I can see here that there is already a work in progress.

Last edited 3 years ago by Tim Graham (previous) (diff)

comment:20 by Jacob Walls, 3 years ago

Absolutely! If you want to use the previous patch as a starting point, be sure to credit the original author with Co-authored-by: Name <email> in the commit message. Assign yourself the ticket when you intend to get going.

comment:21 by Bálint Balina, 3 years ago

Ok, I have super simple solution, but I would like to pre check if this is a possible way to go. Can we use lower case table aliases?

I have applied the following monkeypatch, and everything works fine for my case:

Query.alias_prefix = 't'
Query.subq_aliases = frozenset(['t'])

def bump_prefix(self, outer_query):
    ...
    alphabet = ascii_lowercase
    ...
Query.bump_prefix = bump_prefix

comment:22 by Simon Charette, 3 years ago

No it's just working around the issue and will break on backends that follow the SQL standard with regards to unquoted aliases (e.g. Oracle).

comment:23 by Mariusz Felisiak, 2 years ago

Owner: Chetan Khanna removed
Status: assignednew

comment:24 by aysum1914, 9 months ago

Well, I came here from #32839 and believe that it better explains the issue. I also see a 'missing FROM-clause entry for table "T5" ' error when using an annotated field in a subquery with OuterRef which is joined multiple times in the query and gets the name of T5. When I checked the raw query I saw that the table joined as T5 but used in the subquery as "T5" and then raised this error.

Last edited 9 months ago by aysum1914 (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top