Opened 3 years ago

Last modified 3 years ago

#24386 new Bug

Querysets with filters and exclusions based on deep relations build invalid queries

Reported by: Raphael Gaschignard Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've hit this bug in 1.6, and reproduced it on master (django v1.9.dev20150221182749)

I have the following models:

class Owner(models.Model):
    pass

class Employee(models.Model):
    owner = models.ForeignKey(Owner)
    status = models.CharField(max_length=100)
    start_date = models.DateField()

When trying to find the oldest active employees for each owner, I tried the following query:

>>> query = Employee.objects.filter(status='active').exclude(owner__employee__start_date__lte=F('start_date'),owner__employee__status='active').distinct()
>>> query
  [...]
  File "/Users/rtpg/proj/test_django/django-trunk/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/Users/rtpg/proj/test_django/django-trunk/django/db/utils.py", line 95, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/Users/rtpg/proj/test_django/django-trunk/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
ProgrammingError: missing FROM-clause entry for table "u1"
LINE 1: ...e_employee" U0 INNER JOIN "core_employee" U2 ON ( U1."id" = ...
                                                             ^

The SQL query generated by the queryset seems to be at fault

>>> print query.query
SELECT DISTINCT "core_employee"."id", "core_employee"."owner_id", "core_employee"."status", "core_employee"."start_date" FROM "core_employee" WHERE ("core_employee"."status" = active AND NOT ("core_employee"."owner_id" IN (SELECT U2."owner_id" AS Col1 FROM "core_employee" U2 WHERE U2."status" = active) AND "core_employee"."owner_id" IN (SELECT U2."owner_id" AS Col1 FROM "core_employee" U0 INNER JOIN "core_employee" U2 ON ( U1."id" = U2."owner_id" ) WHERE U2."start_date" <= (U0."start_date"))))

I'm not very good at reading complex SQL queries, so I'm not sure what this is supposed to look like but in any case this query is refused by PostgreSQL

Change History (5)

comment:1 Changed 3 years ago by Shai Berger

Needs tests: set
Triage Stage: UnreviewedAccepted

Accepting based on details in report, didn't reproduce myself.

Just wanted to note for the submitter that even without the bug, the query you wrote would not achieve the stated goals -- it would return no employees. To do what you intended, you need to replace your lte with lt. Also, another approach for this relies on annotations (annotate employee with Min(owner__employee__startdate) and filter on that).

comment:2 Changed 3 years ago by Raphael Gaschignard

I don't know if annotations allow me to work around this. I need to find the minimum of only active employees, but to my knowledge the aggregates over related fields don't really work in combination with filters:

I added a number field to my Employee model:

In [8]: Employee.objects.annotate(Min('owner__employee__number')).values()
Out[8]: [{'status': u'not_active', 'number': 3, u'id': 2, 'owner__employee__number__min': 3, 'start_date': datetime.date(2015, 2, 23), u'owner_id': 1}, {'status': u'active', 'number': 4, u'id': 1, 'owner__employee__number__min': 3, 'start_date': datetime.date(2015, 2, 23), u'owner_id': 1}]

In [9]: Employee.objects.filter(number=4).annotate(Min('owner__employee__number')).values()
Out[9]: [{'status': u'active', 'number': 4, u'id': 1, 'owner__employee__number__min': 3, 'start_date': datetime.date(2015, 2, 23), u'owner_id': 1}]
Last edited 3 years ago by Raphael Gaschignard (previous) (diff)

comment:3 Changed 3 years ago by Simon Charette

To find the oldest active employees for each owner you could use a combination of DISTINCT ON and ORDER BY:

Employee.objects.filter(status='active').order_by('owner', 'start_time').distinct('owner')

comment:4 Changed 3 years ago by Raphael Gaschignard

Seemed I lacked a bit of imagination on that one, thanks for the workaround for my specific use case charettes. Though I guess that wouldn't work on a non-Postgres DB.

comment:5 Changed 3 years ago by David Gouldin

It looks like this is as a result of using the combination of an exclude filter with multiple joins and an F object. The multi-join exclude causes the ORM to represent the filter as a subquery, but since that Query object doesn't know about its parent, it has to try to make sense of the F object in the context of the subquery.

Note: See TracTickets for help on using tickets.
Back to Top