Opened 12 years ago

Last modified 11 years ago

#18375 closed Bug

F() doesn't work as expected across multijoin relations — at Version 13

Reported by: FunkyBob Owned by: Anssi Kääriäinen
Component: Database layer (models, ORM) Version: 1.4
Severity: Release blocker Keywords:
Cc: German M. Bravo Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Ramiro Morales)

When trying to compare a field to another on the same related record, F() will compare against a separate alias of the table, thus not guaranteeing filtering against the same row.

There doesn't appear to be anything in the docs or tests to indicate what the 'correct' behavior is. Also, there's no apparent way to control it.

Using the attached models, I get the following:

(InteractiveConsole)
>>> from sample import models
>>> from django.db.models import Q, F
>>> qset = models.Uber.objects.filter(unter__available__gt=F('unter__used'))
>>> str(qset.query)
'SELECT "sample_uber"."id", "sample_uber"."title" FROM "sample_uber" INNER JOIN "sample_unter" ON ("sample_uber"."id" = "sample_unter"."uber_id") INNER JOIN "sample_unter" T3 ON ("sample_uber"."id" = T3."uber_id") WHERE T3."available" >  "sample_unter"."used"'

The SQL nicely formatted shows:

SELECT "sample_uber"."id", "sample_uber"."title"
FROM "sample_uber"
INNER JOIN "sample_unter" ON ("sample_uber"."id" = "sample_unter"."uber_id")
INNER JOIN "sample_unter" T3 ON ("sample_uber"."id" = T3."uber_id")
WHERE T3."available" >  "sample_unter"."used"

So, the F('unter__used') is using the 'sample_unter' join, whereas the unter__available__gt is using the T3 alias.

Change History (14)

by FunkyBob, 12 years ago

Attachment: models.py added

comment:1 by Anssi Kääriäinen, 12 years ago

Summary: F() doesn't work as expected across relationsF() doesn't work as expected across multijoin relations
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

To me this seems like a bug, or at least non-wanted behavior. Usually references inside one operation target the same alias, for example filter(unter__available__gt=1, unter__exact=foo) should target the same unter reference, while filter(unter__available__gt=1).filter(unter__exact=foo) should target different references.

If this can be fixed at this point needs at least some consideration. The change could break existing queries, and there wouldn't be any way to get the original behavior back, as you can split one filter clause to two different filter operations.

I have been thinking that it would be useful to be able to annotate the relations in the query. So, you could do:

.annotate(unter_ref='unter').filter(unter_ref__available__gt=F('unter_ref__used'))
OR
.annotate(unter_ref1='unter', unter_ref2='unter').filter(unter_ref1__available__gt=F('unter_ref2__used'))

The first would generate the query you are looking for, the second would generate what you get now. However there is a long way to get to the point where you can use the annotate() like in the example.

I am marking this as accepted. The reasoning is that the reference under the same .filter() call should target the same relation, and that the current behavior doesn't seem that useful. Backwards compatibility could turn this decision however.

For the time being I can't figure out a way to generate the query you want without using qs.extra().

comment:2 by FunkyBob, 12 years ago

A thought that came to me was a new keyword argument for F() to suggest it try to use an existing join to that table...

comment:3 by Anssi Kääriäinen, 12 years ago

Has patch: set

What happens is that when a filter is added to query, the right hand value (F-obj in this case) is added first. At this point no joins exist. Then, the left hand side is added to the query (the lookup is resolved), but it can't reuse the join generated by the F-obj. At this point a new join is generated -> problems.

Usually the F-obj will reuse any existing join, regardless if the join is generated in the same filter call or not. I am calling this a bug. Actually, this _must_ be a bug. Doing

.filter(unter__available__gt=0, unter__available__gt=F('unter__used'))

the F will target the existing join if the unter__available__gt=0 is evaluated first. If not then it will generate a new join. This means that randomized dict ordering will cause random results from the above query.

See https://github.com/akaariai/django/tree/ticket_18375 for one possible solution for this.

comment:4 by Anssi Kääriäinen, 12 years ago

There might be an easier fix: just pass the reusable joins set from add_filter to the SQLEvaluator, then to prepare and finally to setup_joins. This means just adding the parameter to couple of methods (it can be by default the empty tuple), and things should work.

comment:5 by Anssi Kääriäinen, 12 years ago

I have verified that dict-randomization will randomize the result of the query mentioned in comment:3.

I think we have to go with what FunkyBob suggests in comment:2 - add a "reuse_joins" argument to F() (better name welcome). If we change how F() is added to query without adding a way to define the join-reuse then we will be leaving some users in a hard situation: upgrade to 1.5 breaks their project with no way to fix the breakage.

I suggest we default to "reuse_joins=True", this way every clause under same .filter() call will target the same joins, unless explicitly asked to work otherwise.

I will try to write a patch for this.

comment:6 by Ian Clelland, 12 years ago

akaariai, shouldn't the query in comment:3 fail with a Syntax error? You can't specify the same keyword argument twice. The only way to filter twice on the same keyword is to call .filter() twice, which is well defined (in execution semantics, at least), or include duplicate Q objects (or mix Q objects with a keyword argument). Otherwise, you at least need to use a different operator, to filter twice on the same database field. Either

.filter(Q(unter__available__gt=0), Q(unter__available__gt=F('unter__used')))

or

.filter(Q(unter__available__gte=1, unter__available__gt=F('unter__used'))

should trigger this behaviour, though.

comment:7 by Anssi Kääriäinen, 12 years ago

True, that example isn't valid Python.

The simplest example is something like this:

.filter(unter__id=1, id=F('unter__id'))

where unter is any reverse relation.

Depending on the ordering of the kwargs dict the F() either creates a new join or not. This means the same query will execute differently on different runs if dict randomization is turned on.

comment:8 by Aymeric Augustin, 12 years ago

Since we aim for Python 3.3 support in Django 1.5, this is a release blocker.

comment:9 by Anssi Kääriäinen, 12 years ago

Owner: changed from nobody to Anssi Kääriäinen
Severity: NormalRelease blocker

comment:10 by Anssi Kääriäinen, 12 years ago

Hmmh, I think this one is even more complex than I previously thought - how about

qs.filter(someval=F('revrel__somecol'), revrel__othercol=F('someval'))

If the first kwarg is seen first, then we will create a join for revrel. However, this join is _not_ reusable for the second filter's lookup. If these are seen in the other order, then we first generate revrel for the lookup, then reuse it in F().

I know we can fix the above. But I am wondering if we could just weasel out of this situation: document that use Q() lookups instead if dict ordering might be a problem. Q() lookups do have a defined order, so there is no problem in that case.

I have a work in progress patch at: https://github.com/akaariai/django/compare/ticket_18375

comment:11 by Anssi Kääriäinen, 12 years ago

I rebased the ticket_18375 branch with another attempt. Now I like what it is doing in code, but I still wonder how to document this in release notes. What I have now seems a little complex...

Docs/API design help welcomed!

comment:12 by Nicolas Lara, 11 years ago

I think the documentation for this is overly complicated and requires detailed knowledge of what the SQL translation will be (which the ORM users shouldn't need).

The fix to only use one reference would probably work for 90% of the cases and for the other cases where you need to control exactly how the SQL query would be written you can fallback to raw SQL. I don't see a need for DupeMultisF; this fine grained control belongs in the SQL layer and not the ORM layer. Documenting that this generates a single join in 1.5 should be enough.

comment:13 by Ramiro Morales, 11 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top