Opened 15 years ago

Closed 13 years ago

Last modified 12 years ago

#11319 closed (fixed)

ForeignKey filters use the wrong field to prepare values for database

Reported by: Russell Keith-Magee Owned by: Carl Meyer
Component: Database layer (models, ORM) Version: 1.0
Severity: Keywords:
Cc: kmishler@…, Carsten Fuchs, Andrey Golovizin Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

There are a couple of edge cases where filters on foreign keys can fail, as a result of a deep-seated problem with the way get_db_prep_value() is used.

If Model A has a foreign key pointing to B, a query of A.objects.filter(b=b_instance) will use the primary key on A to prepare the value for the database.

If both models have the same primary key type, this poses no problem.

However if:

  1. the two fields have a different get_db_prep_value() implementation (usually caused by having primary keys that are of substantially different types), or
  2. the foreign key specifies a to_field argument

then the value prepared for database lookup will be wrong.

Attached patch describes regression tests for this problem.

Related tickets:

  • #10243 papered over this problem for the admin
  • #10785 originally reported this problem for a small subset of cases (when the PK is a custom field)

Attachments (1)

t11319-r11007.diff (2.3 KB ) - added by Russell Keith-Magee 15 years ago.
Regression tests demonstrating problem

Download all attachments as: .zip

Change History (22)

by Russell Keith-Magee, 15 years ago

Attachment: t11319-r11007.diff added

Regression tests demonstrating problem

comment:1 by Alex Gaynor, 15 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Russell Keith-Magee, 14 years ago

#11938 was another report, with some extra test cases.

comment:3 by Antti Kaihola, 14 years ago

Work-around

Given

class Poll(models.Model):
    code = models.CharField(max_length=8)
    question = models.CharField(max_length=200)

class Choice(models.Model):
    poll = models.ForeignKey(Poll, to_field='code')
    choice = models.CharField(max_length=200)
    votes = models.IntegerField()

instead of

Choice.objects.filter(poll=poll_instance)

you can use

Choice.objects.filter(poll__code=poll_instance.code)

The problem is limited to queries. Creating objects works correctly:

Choice(poll=poll_instance, choice='First choice', votes=100).save()
Choice.objects.create(poll=poll_instance, choice='Other choice', votes=1)
Poll.choice_set.create(choice='Bad choice', votes=10000000)

Beware that get_or_create() doesn't work, though!

comment:4 by Russell Keith-Magee, 14 years ago

milestone: 1.21.3

Not critical for 1.2

comment:5 by kmishler, 14 years ago

Cc: kmishler@… added

comment:6 by Russell Keith-Magee, 14 years ago

#13343 reported the same problem, but with an effect in m2m relations.

comment:7 by Ramiro Morales, 14 years ago

#13574 reported the same issue as #13343 and contained a patch with tests fixing the particular m2m case.

comment:8 by Carsten Fuchs, 14 years ago

Cc: Carsten Fuchs added

comment:9 by Russell Keith-Magee, 13 years ago

See #14056 for a possibly related issue.

comment:10 by Andrey Golovizin, 13 years ago

Cc: Andrey Golovizin added

comment:11 by Fabián Ezequiel Gallina, 13 years ago

Owner: changed from nobody to Fabián Ezequiel Gallina

comment:12 by Carl Meyer, 13 years ago

Owner: changed from Fabián Ezequiel Gallina to Carl Meyer

Failing tests updated to trunk (and non-doctest) here: https://github.com/carljm/django/compare/master...t11319

I'm working on this now, so reassigning to myself. fgallina, if you've made any progress, please post an update so I don't duplicate work.

comment:13 by Carl Meyer, 13 years ago

Has patch: set
Patch needs improvement: set

As best I can tell (based on being able to uncomment the commented test in custom_pk and have it pass) case (1) from the description here has already been fixed sometime in the past two years, leaving just case (2) -- to_field.

I've added tests for various queries (some currently-broken, one currently-working) involving an FK with to_field, and my github branch linked above now fixes all of the broken cases, and the full test suite passes under Postgres and SQLite (queries tests pass under MySQL, InnoDB and MyISAM). That branch also fixes the cascade-deletion symptom in #15121, which I closed as duplicate of this.

I won't claim certainty that the fixes in that branch are the ideal fixes for this, and am not ready to commit this without some other ORM-experienced eyes on it first. But -- all tests pass, and that's at least a useful starting point.

comment:14 by Russell Keith-Magee, 13 years ago

@carljm -- thanks for taking a look at this one Carl.

The change to related.py looks correct to me.

The change to query.py raises a slight bell to me, though. It's a definite improvement, catching at least one problem case. However, are you sure that this is only a problem with recursive models? I would have thought that branch of code would have triggered on any reverse to_field lookup -- in which case, 'rel.to_field is None'-type logic, analogous to that used by your change in related.py, would seem appropriate.

That said, it looks like the queries test takes reverse to_field queries for a bit of a walk; so I might be completely off.

So - it's probably worth having a bit of a poke in that general area to see if you can manufacture a test case that breaks; if you can't, and the test suite passes, I'd be happy to see this land in trunk.

comment:15 by Carl Meyer, 13 years ago

The change in query.py may indeed be wrong, but if so it's not because it needs to be broader; rather because it (arguably?) shouldn't be there at all.

The "to_field" argument to a ForeignKey generally only applies to the target model of that FK, obviously: if you're traversing that same relationship in the reverse direction, the lookup should just use the primary key of the model on which the FK is defined.

The problem is that the same RelatedField (ForeignKey or OneToOneField) object is asked to "get_prep_lookup" the lookup value in either case, regardless of whether the lookup is taking place in a forwards or reverse direction across that RelatedField. And currently, it's asked to do so without ever being informed which type of lookup is happening!

I guard the to_field lookup in RelatedField._pk_trace with an isinstance check, to make sure the value is actually an instance of the model the FK points to (or a subclass of it). In most cases, this is enough to prevent use of to_field on a reverse FK traversal, because normally the two sides of an FK are two different model classes. But in the case of a recursive FK that uses to_field, both sides of the FK are the same model class, so the isinstance check passes either way, and to_field is used on the reverse lookup too.

The change I'd made in query.py worked around this specific case by also using the to_field column name rather than the PK column name in the recursive reverse lookup case. Then it's ok that _pk_trace doesn't know the difference and preps the to_field value in either case.

If you expect recursive FKs to work just like other FKs, this is the wrong fix; the to_field should only be used for lookups in the forwards direction. If you expect to_field to be used for lookups in both directions on a recursive FK (only), because in that specific case it can, it might be the right fix. I'm not actually sure how to choose between those - either one should result in a correct query.

I've now updated my branch with the alternative fix, which requires significantly more invasive changes, because the knowledge of whether it's a forward or reverse lookup currently exists only in Query.setup_joins(), and has to be passed through the Constraint object in order to get to RelatedField.get_prep_lookup(). It also requires adding a new get_prep_lookup_reverse() method to Fields, in order to avoid requiring all custom Field classes to update their get_prep_lookup() methods with a new parameter.

I've run this fix through the same tests as above (full test suite on SQLite and Postgres, "queries" and "delete_regress" on MySQL, ISAM and InnoDB), and everything passes this way as well.

I think a case can be made for either version of the fix.

The case for the first fix is that it's much less invasive, and it's not unreasonable to apply to_field in lookups both directions on a recursive FK; with the same model on each end, there's no reason to_field can't work just fine both directions.

The case for the second fix is twofold: 1) That it really seems like RelatedField _ought_ to know whether it's prepping a value for a forward or reverse lookup, so perhaps the invasiveness is justified, and 2) That the behavior of ForeignKey.to_field shouldn't change just because its a recursive FK.

There is one concrete difference between the two fixes. It changes whether you need to use the PK value or the to_field value if specifying the value directly in the lookup, rather than providing an object. In that sense, the first (less-invasive) fix is backwards-incompatible for anyone currently working around this bug on a recursive FK by providing using "=obj.pk" instead of "=obj" in their lookup; they'd have to switch to "=obj.to_field_name" (or just switch to "=obj").

Sorry for the novel - thoughts on this?

comment:16 by Carl Meyer, 13 years ago

Discussed this with Alex Gaynor on IRC. Using this Category class as an example, we're both agreed that the first fix results in more intuitive usage:

class Category(models.Model):
    name = models.CharField(max_length=10)
    parent = models.ForeignKey("self", to_field="name", related_name="children")

With either fix, you can query it this way in the forward direction: Category.objects.filter(parent="Kittehs") and can query it using a full Category object in either direction.

Currently, and if we make the second fix, you have to query in the reverse direction using the PK: Category.objects.filter(children=1)

Whereas if we make the first fix, you could query like this: Category.objects.filter(children="Noms")

Seems clear that the latter is more intuitive. And it requires the less-invasive fix (which I've pushed to https://github.com/carljm/django/compare/master...t11319-simpler for comparison).

I'd be in favor of making this fix; my only remaining question is whether the backwards-incompatible change is a concern. It's a very rare edge case of an edge case: I think we can safely call it part of a bugfix.

comment:17 by Russell Keith-Magee, 13 years ago

Needs documentation: set

There's a fine line between backwards incompatibility and an edge case bug.

The Category example makes more sense to me when the to_field is used for queries in both directions, so I'm inclined to agree that the behavior of 'foreign key on self with to_field' is a bug that hasn't been noticed because it's an edge case of an edge case.

My only suggested addition to the patch would be some documentation to describe the edge case -- presumably in the documentation for to_field.

in reply to:  17 comment:18 by Carl Meyer, 13 years ago

Needs documentation: unset

Replying to russellm:

My only suggested addition to the patch would be some documentation to describe the edge case -- presumably in the documentation for to_field.

I spent a little while working on some draft documentation, and my conclusion is that I think no addition to the docs is actually warranted here. The behavior we are fixing it to is quite intuitive (which is why we can consider it a bug that it ever was otherwise), and the explanation of why it's even worth explaining at all is comparatively difficult (requires explanation of what is meant by "forward" or "reverse" lookups across a self-referential foreign key, which is hard to explain clearly without a full example). In the end, all I can come up with is either a short one-sentence note that feels opaque and probably useless, or a longer explanation, using the Category example, that is IMO too many paragraphs wasted on such an obscure case, given that I don't think someone running across this behavior is likely to find it particularly surprising or noteworthy.

Anyway, I pushed the long-version draft docs to https://github.com/carljm/django/commit/f949b52fea10f06423648c476878a2218587db66 - the short and opaque version would consist of only the first sentence of that (and probably eliminating the versionadded marker? We don't usually do that for bugfixes, but it seems like it's needed if we're giving a full code example that we know doesn't work on older versions...). As I said, I think the best option here is no docs change at all.

The one other thing that's making me uncomfortable about committing this as a fix here, is that right now this bug is the only open reference to some unspecified "deep-seated issue" with get_db_prep_lookup and related fields that Malcolm was apparently working on at some point (discussed in #10243 and #10785). Currently my best theory is that that issue probably has to do with what I discuss above: how get_db_prep_lookup on a related field is used for both forwards and reverse lookups without knowing which it is. If that's the issue, I think I'm OK saying that at this point it doesn't appear to be something that needs to be fixed for any practical use case. But I'd be glad for any light Russell or Malcolm could shed on what this "deep-seated issue" is/was, and whether we should be keeping a bug open for it.

comment:19 by Carl Meyer, 13 years ago

Resolution: fixed
Status: newclosed

(In [15303]) Fixed #11319 - Added lookup support for ForeignKey.to_field. Also reverted no-longer-needed model formsets workaround for lack of such support from r10756. Thanks Russell and Alex for review.

comment:20 by Carl Meyer, 13 years ago

(In [15304]) [1.2.X] Fixed #11319 - Added lookup support for ForeignKey.to_field. Also reverted no-longer-needed model formsets workaround for lack of such support from r10756. Thanks Russell and Alex for review.

Backport of r15303 from trunk.

comment:21 by Jacob, 12 years ago

milestone: 1.3

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top