Code

Opened 13 months ago

Closed 13 months ago

Last modified 13 months ago

#20564 closed Bug (fixed)

Querying (exclude) on a GenericRelation's DateTimeField through a related object fails

Reported by: nferrari Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6-alpha-1
Severity: Release blocker Keywords: contenttype exclude
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

While excluding directly on a GenericRelation's DateTimeField works (see c3 variable below), it does fail if there is a related object in between (c4). This does not occur if lookup is done on a CharField (c1 and c2).

Here are simple models to illustrate:

from django.db import models
from django.contrib.contenttypes import generic
from django.contrib.contenttypes.models import ContentType

class A(models.Model):

    name = models.CharField(max_length="50")
    flag = models.DateTimeField(null=True, blank=True)

    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    content_object = generic.GenericForeignKey('content_type', 'object_id')

    class Meta:
        app_label = 'foobar'

class B(models.Model):

    a = generic.GenericRelation(A)

    class Meta:
        app_label = 'foobar'

class C(models.Model):

    b = models.ForeignKey(B)
    a = generic.GenericRelation(A)

    class Meta:
        app_label = 'foobar'

# Works:
c1 = C.objects.exclude(a__name="foo")
c2 = C.objects.exclude(b__a__name="foo")
c3 = C.objects.exclude(a__flag=None)

# Fails:
c4 = C.objects.exclude(b__a__flag=None)

Find a workaround seems pretty tricky in some cases...

Attachments (2)

dump.json (1.1 KB) - added by nferrari 13 months ago.
A, B and C test objects.
dumpct.json (1.4 KB) - added by nferrari 13 months ago.
Content types dump.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 13 months ago by nferrari

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Sorry, I forgot the main part, the traceback:

Traceback (most recent call last):
  File "./exclude_generic.py", line 38, in <module>
    c5 = C.objects.exclude(b__a__flag__isnull=True)
  File "/data/www/dev/nferrari/env/lib/python2.6/site-packages/django/db/models/manager.py", line 175, in exclude
    return self.get_queryset().exclude(*args, **kwargs)
  File "/data/www/dev/nferrari/env/lib/python2.6/site-packages/django/db/models/query.py", line 601, in exclude
    return self._filter_or_exclude(True, *args, **kwargs)
  File "/data/www/dev/nferrari/env/lib/python2.6/site-packages/django/db/models/query.py", line 610, in _filter_or_exclude
    clone.query.add_q(~Q(*args, **kwargs))
  File "/data/www/dev/nferrari/env/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1200, in add_q
    clause = self._add_q(where_part, used_aliases)
  File "/data/www/dev/nferrari/env/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1236, in _add_q
    current_negated=current_negated)
  File "/data/www/dev/nferrari/env/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1107, in build_filter
    can_reuse, e.names_with_path)
  File "/data/www/dev/nferrari/env/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1455, in split_exclude
    join_field.field.foreign_related_fields[0].name)
AttributeError: 'ForeignKey' object has no attribute 'field'

This is a regression because it worked well on 1.5 version, this commit seems responsible (blame on github):
https://github.com/django/django/commit/97774429aeb54df4c09895c07cd1b09e70201f7d

Last edited 13 months ago by nferrari (previous) (diff)

comment:2 Changed 13 months ago by akaariai

  • Keywords datetimefield removed
  • Severity changed from Normal to Release blocker
  • Triage Stage changed from Unreviewed to Accepted

I have confirmed this. The reason is the trim_prefix() code in split_exclude(). This code is really ugly (and I am to blame). Lets see if there is any easy fix for this. In any case rewriting the whole trim_prefix() logic will need to be done at some point.

The reason doesn't seem to be using a datetime field, it seems to be using a nullable field.

comment:3 Changed 13 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 31fd64ad8a98d7de0acb9144ae6f7bd124700cb0:

Fixed #20564 -- Generic relations exclude() regression

The patch for #19385 caused a regression in certain generic relations
.exclude() filters if a subquery was needed. The fix contains a
refactoring to how Query.split_exclude() and Query.trim_start()
interact.

Thanks to Trac alias nferrari for the report.

comment:4 Changed 13 months ago by akaariai

I think the split_exclude() + trim_start() interaction is now logically correct. Unfortunately the code is complex. Before it was complex and incorrect, now complex and (hopefully) correct.

The source of this complexity is SQL's NULL handling weirdness (everybody loves booleans which have three values, right?), and then in addition how IN (and especially NOT IN) work in SQL. NULLs in the inner and outer query will cause problems. Maybe cleaner code + more comments would help. But just writing these queries in plain SQL is a confusing task.

comment:5 Changed 13 months ago by nferrari

Thanks for your reactivity.

comment:6 Changed 13 months ago by nferrari

I just tried, unfortunately it seems the result count() is very diffent. Would you like me to provide some detailed tests?

comment:7 Changed 13 months ago by akaariai

Yes, please do.

Changed 13 months ago by nferrari

A, B and C test objects.

comment:8 Changed 13 months ago by nferrari

Ok I updated the models to have a name CharField each, and inserted a few objects (see dump.json).

The 1.5 shell returned:

In [1]: from foobar.models import *

In [2]: C.objects.exclude(a__flag=None).count()
Out[2]: 5

In [3]: C.objects.exclude(b__a__flag=None).count()
Out[3]: 4

While the new environment shell gives:

>>> from foobar.models import *
>>> C.objects.exclude(a__flag=None).count()
0
>>> C.objects.exclude(b__a__flag=None).count()
0

Old results are expected, let me know if you need more information.

comment:9 Changed 13 months ago by akaariai

Are you sure you have same contentype IDs in both 1.5 and 1.6 databases? This will cause differing results. For the same reason the dump isn't easy to use - I don't actually know what the content_type values refer to in the dump.

The contenttype ID mismatch would explain the difference - every a__flag is None if the contenttype ids do not match.

Changed 13 months ago by nferrari

Content types dump.

comment:10 Changed 13 months ago by nferrari

I'm pretty sure because I use the same database for both shells. I just switch from a 1.5 to a 1.6 (trunk) environment!
I uploaded the content types dump, where you see A, B and C are respectively 7, 8 and 9.

comment:11 Changed 13 months ago by akaariai

Thanks, I will investigate this.

comment:12 Changed 13 months ago by akaariai

I believe 1.5 just didn't handle these queries correctly. For example the queryset C.objects.filter(b__a__flag=None) resulted in this SQL:

SELECT "generic_relations_regress_c"."id", "generic_relations_regress_c"."b_id" FROM "generic_relations_regress_c"
INNER JOIN "generic_relations_regress_b" ON ("generic_relations_regress_c"."b_id" = "generic_relations_regress_b"."id")
LEFT OUTER JOIN "generic_relations_regress_a" ON ("generic_relations_regress_b"."id" = "generic_relations_regress_a"."object_id")
LEFT OUTER JOIN "django_content_type" ON ("generic_relations_regress_a"."content_type_id" = "django_content_type"."id")
WHERE ("generic_relations_regress_a"."flag" IS NULL AND "django_content_type"."id" = 25 )
ORDER BY "generic_relations_regress_c"."id" ASC

Note the LEFT OUTER JOIN + contenttype id restriction in WHERE clause. That just can't be correct (the LEFT OUTER JOIN doesn't work if it is constrained like that in WHERE clause).

The 1.6 version is this:

SELECT "generic_relations_regress_c"."id", "generic_relations_regress_c"."b_id"
FROM "generic_relations_regress_c"
INNER JOIN "generic_relations_regress_b" ON ( "generic_relations_regress_c"."b_id" = "generic_relations_regress_b"."id" )
LEFT OUTER JOIN "generic_relations_regress_a" ON (
    "generic_relations_regress_b"."id" = "generic_relations_regress_a"."object_id"
     AND ("generic_relations_regress_a"."content_type_id" = 28))
WHERE "generic_relations_regress_a"."flag" IS NULL ORDER BY "generic_relations_regress_c"."id" ASC

Note the content_type_id restriction in JOIN clause itself. This seems correct to me.

Also, in the JSON you gave there aren't actually any flag != null objects. So, the query C.objects.filter(a__flag=None) will find every C object (either there is no A at all, or the a has flag=None), and thus C.objects.exclude(a__flag=None) can't find anything.

So, I believe that you are getting incorrect results in 1.5. Of course, the results might be what you want, but they aren't what the queries should produce.

I wonder if we should add release notes about ORM changes...

comment:13 Changed 13 months ago by nferrari

I really do not understand the new logic. If we look only at the first query, with C objects and relationship with A. I'm looking for every C object having either no relation with any A object OR having a relationship only with some A objects having a value on the flag DateTimeField. Thus, I'm excluding every C object having a relation with some A objects having no value in the flag field.

This query makes sense to me:

C.objects.exclude(a__flag=None)

I tried this one too:

C.objects.exclude(a__flag__istrue=True)


... but the result is the same.

Does that make sense to you? Maybe I could find an example more explicit than A, B and C objects...

Last edited 13 months ago by nferrari (previous) (diff)

comment:14 Changed 13 months ago by akaariai

The logic is that now C.objects.exclude(a__flag=None) works the same way, no matter if a is a generic relation or normal reverse relation. That is, with models:

class C(models.Model):
    pass

class A(models.Model):
    flag = models.DateTimeField()
    c = models.ForeignKey(C, related_name='a')

C.objects.filter(a__flag=None) behaves the same way as with GenericRelation.

comment:15 Changed 13 months ago by nferrari

If the logic is better, fine. Now, I'm looking for another way to make my query through Django ORM? If not, this is a regression.

comment:16 Changed 13 months ago by akaariai

It is irrelevant if the new logic is better. It is how this ORM operation has been defined, the 1.5 code just didn't work the way it was supposed to work for GenericRelations.

So, the query is:

  • all C objects with no a objects:

Q(a__isnull=True)

  • OR having a relationship with A objects having a value <=> no flag is NULL | ~Q(a__flag__isnull=True)

There queries are unfortunately very complex to write using the ORM. I am not sure if the above one is actually what you want... In addition, you will need a .distinct() in case some C has multiple a's each having non-null flag.

So, C.objects.filter(Q(a__isnull=True)|~Q(a__flag__isnull=True)).distinct()

I think there would be some room for better multijoin filtering capabilities in the ORM. These are complex to write using the ORM, complex to write using raw SQL (try to find computers that have exactly Windows 7 and Linux as their operating system set) and the queries the ORM produces for these filters aren't optimal.

This ticket doesn't seem to be the correct place to discuss this further. If you claim a regression or want to discuss about possible additions to ORM multijoin filtering capabilities the DevelopersMailingList is the correct place to do that. If you want to discuss about the needed ORM operations for the specific query, django-users might be the correct place to discuss that.

comment:17 Changed 13 months ago by nferrari

Thank you for your detailed answer. The query seems to return expected result.

I'll check that with my main application, and I'll try to fully understand why the 1.5 code didn't work the way it was supposed to for GR, because the old query seemed logical to me, while the new one is more complex.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.