Opened 16 years ago

Closed 10 years ago

#8892 closed Bug (duplicate)

ForeignKey relation not saved as expected

Reported by: Julien Phalip Owned by: blacklwhite
Component: Database layer (models, ORM) Version: 1.0
Severity: Normal Keywords:
Cc: blacklwhite Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Here are two simple models:

class ModelA(models.Model):
    name = models.CharField(max_length=10)

class ModelB(models.Model):
    name = models.CharField(max_length=10)
    a = models.ForeignKey(ModelA, blank=True, null=True)

Now, see the following sequence. When setting the relation *before* saving the related object, the relation itself is not properly saved:

>>> a = ModelA(name='foo')
>>> b = ModelB(name='bar')
>>> b.a = a # Set relation *before* saving related object
>>> a.save() # Save related object
>>> b.save()
>>> b.a
<ModelA: ModelA object>
>>> b = ModelB.objects.get(name='bar')
>>> b.a
<NoneType: None>

Logically I'd expect that, since a is saved before b, the relation would be properly saved but it is not.

Yet, if setting the relation *after* saving the related object, then it works fine (in the below example I only swapped 2 lines: b.a = a and a.save()):

>>> a = ModelA(name='foo')
>>> b = ModelB(name='bar')
>>> a.save() # Save related object
>>> b.a = a # Set relation *after* saving related object
>>> b.save()
>>> b.a
<ModelA: ModelA object>
>>> b = ModelB.objects.get(name='bar')
>>> b.a
<ModelA: ModelA object>

I've tested with MySQL and SQLite.

Attachments (2)

8892.patch (2.7 KB ) - added by blacklwhite 13 years ago.
t8892_2.patch (2.7 KB ) - added by blacklwhite 13 years ago.
new bugfix in (Reverse-)SingleRelatedObjectDescriptor and tests

Download all attachments as: .zip

Change History (26)

comment:1 by mrts, 16 years ago

Resolution: wontfix
Status: newclosed

This is quite expected as a doesn't have a primary key before saving. It can not have it theoretically. This is documented in http://docs.djangoproject.com/en/dev/ref/models/instances/#auto-incrementing-primary-keys .

comment:2 by Malcolm Tredinnick, 16 years ago

Resolution: wontfix
Status: closedreopened

The documentation still needs to be clarified and we can look at raising a more explicit error message.

in reply to:  1 comment:3 by Julien Phalip, 16 years ago

Replying to mrts:

This is quite expected as a doesn't have a primary key before saving. It can not have it theoretically. This is documented in http://docs.djangoproject.com/en/dev/ref/models/instances/#auto-incrementing-primary-keys .

I already knew that it's only after saving the object once that it gets a primary key. But in this case I just assumed that creating the relation (b.a = a) would not be affected whether it was set before or after the related object was saved. I assumed that subsequently saving a then b would also save the relation, no matter the prior sequence of operations.

I've been using Django for a while now, and I've always been quite careful with this logic. But here I admit I was quite surprised by this behaviour (in fact, I was equally surprised that I didn't get hit by this before, which is why I identified it as a bug in Django).

Malcolm, you're talking about improving the error message, but the code above doesn't actually generate any error. In any case, I'd be happy to learn more about how that all works when the docs are updated.

comment:4 by Malcolm Tredinnick, 16 years ago

Triage Stage: UnreviewedAccepted

I know it doesn't generate an error. It should be changed from showing no error to showing something.

Your code is incorrect and Django should tell you that.

in reply to:  4 comment:5 by Julien Phalip, 16 years ago

Replying to mtredinnick:

I know it doesn't generate an error. It should be changed from showing no error to showing something.
Your code is incorrect and Django should tell you that.

That would be very helpful! Thanks Malcolm for this clarification.

comment:6 by Valera Grishin, 16 years ago

I step at the same problem last week and think julien is right in his assumption. It is quite natural to instantiate models first, than fill them with appropriate data, and finally save models in bulk.

Imagine a case when ModelA instance is irrelevant without corresponding ModelB instance. Separated saves will cause one model, say ModelA, to be saved, then find that ModelB cannot be saved for some reason, thus causing ModelA to be deleted right after it has been saved.

comment:7 by blacklwhite, 13 years ago

Cc: blacklwhite added
Owner: changed from nobody to blacklwhite
Severity: Normal
Status: reopenednew
Type: Uncategorized

comment:8 by Luke Plant, 13 years ago

Type: UncategorizedBug

by blacklwhite, 13 years ago

Attachment: 8892.patch added

comment:9 by blacklwhite, 13 years ago

Easy pickings: unset
Has patch: set
Triage Stage: AcceptedReady for checkin
UI/UX: unset

I fixed the bug as follows. If you save a model with a foreign key to a model which is not saved until this point, django will show an error:

>>> a = ModelA(name="a")
>>> b = ModelB(name="b")
>>> a.b = b
>>> a.save()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "django\db\models\base.py", line 492, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)
  File "django\db\models\base.py", line 521, in save_base
    self.refresh_foreign_keys()
  File "django\db\models\base.py", line 475, in refresh_foreign_keys
    ". Have you already saved the " + str(fk_ref) + " object?")
ValueError: Cannot find a primary key for b as a foreign key in an instance
 of ModelA. Have you already saved the b object?

If you save a related model after setting the related one to the first object, it works as expected.

>>> a = ModelA(name="a")
>>> b = ModelB(name="b")
>>> a.b = b
>>> b.save()
>>> a.save()

Problems I've considered:

  • Compatible to users accessing to the foreign key field as follows:
    >>> a = ModelA(name="a")
    >>> b = ModelB(name="b")
    >>> b.save()
    >>> a.b = b
    >>> a.b_id
    

So it is neccessary to set the id to a_id during the set method of a.b = b. However, the assertion that the fk of b is the same as a.b_id must be true before writing a to the database.

  • Not loading the related object from the database.
    >>> a = ModelA.objects.get(name="a")
    >>> a.name = "b"
    >>> a.save()
    

The instance of Object b will not be loaded from the database to reduce db-queries.

  • This patch does only try to set the primary key of the related object, if no primary key is set and a related object exists. In this way it will never do an additional database query. Of course it produces an overhead for every call of the save() method, but by comparison to the following database-query it is negligible.
Last edited 13 years ago by blacklwhite (previous) (diff)

comment:10 by blacklwhite, 13 years ago

Resolution: fixed
Status: newclosed

comment:11 by blacklwhite, 13 years ago

Resolution: fixed
Status: closedreopened

comment:12 by Aymeric Augustin, 13 years ago

Needs tests: set
Triage Stage: Ready for checkinAccepted

Please don't mark your own patches as RFC — see the contribution guidelines.

comment:13 by blacklwhite, 13 years ago

Sorry, I misinterpreted the guideline chart.

comment:14 by Luke Plant, 13 years ago

Patch needs improvement: set

@blacklwhite:

Thanks for having a go at this. I think this patch is the wrong approach, however. It is unnecessarily complicated, and changes the semantics of saving an object in some significant ways, by adding a new point where the object's fields are set from the related object.

We want a smaller fix, which only changes existing behaviour for the case when a developer has made a mistake - as in the example above. We already have checking at the point of assignment, such as checking that the object is of the right type, which is in SingleRelatedObjectDescriptor, and that's where the new check should be made. It should be a two liner.

Of course, this also needs tests.

On a point of style, you should use % interpolation for building up strings.

by blacklwhite, 13 years ago

Attachment: t8892_2.patch added

new bugfix in (Reverse-)SingleRelatedObjectDescriptor and tests

comment:15 by blacklwhite, 13 years ago

Needs tests: unset
Patch needs improvement: unset

Thank you for your feedback. Your hint made it much more simpler to find the correct piece of code to bugfix it. And - as you said - it fits formerly as a two liner into the existing code :-)

The patch will now raise an error and results in the following behaviour:

>>> a = ModelA(name="a")
>>> b = ModelB(name="b")
>>> a.b = b
ValueError: Cannot assign "<ModelB: ModelB object>": "ModelB.b" must have
a primary key.

It does the same the reversed scenario.

>>> a = ModelA(name="a")
>>> b = ModelB(name="b")
>>> b.a_set.add(a)

Is the error message ok or would it be better to append "You have to save <ModelB: ModelB object> before assignment." or something like that?
Are the both tests in the right class / method?
What about compatibility to people using a workaround like follows; don't considering about that?

>>> a = ModelA(name="a")
>>> b = ModelB(name="b")
>>> a.b = b
>>> print a.b.name
>>> b.save()
>>> a.b_id = a.b.id
>>> a.save()
Version 0, edited 13 years ago by blacklwhite (next)

comment:16 by Luke Plant, 13 years ago

Patch needs improvement: set

This is much better, but the tests don't test some of the new code. If you remove the hunk of the patch for SingleRelatedObjectDescriptor, the tests added still pass. (Hint: the cycle described on Test-driven development would have prevented this - always test first and make sure the test fails).

in reply to:  16 comment:17 by blacklwhite, 13 years ago

Replying to lukeplant:

This is much better, but the tests don't test some of the new code. If you remove the hunk of the patch for SingleRelatedObjectDescriptor, the tests added still pass. (Hint: the cycle described on Test-driven development would have prevented this - always test first and make sure the test fails).

I used the following command runtests.py --settings=test_sqlite many_to_one to test django. I've tried it again with the current django trunk and the first added assertion ends in a failed test. The second one ends in an errored test because of raising an IntegrityError.

After installing the patch the tests work fine.

I used TDD so your statement about the passing tests is curious for me. Where could my mistake be?

comment:18 by Anssi Kääriäinen, 12 years ago

I wonder if the set is the proper place to raise the error. One could construct model instances without ever saving them. Isn't the error really happening at save time?

The check would be to go though foreign key fields in model.save(). If the instance's __dict__ contains a model for the foreign key field's name, but the value for field.attname in self.__dict__ is missing or is None, then raise an error.

comment:19 by anonymous, 12 years ago

Here's a StackOverflow question about this

comment:20 by Aymeric Augustin, 12 years ago

Status: reopenednew

comment:21 by altlist, 10 years ago

I am a little confused why the original example isn't supported.

>>> a = ModelA(name='foo')
>>> b = ModelB(name='bar')
>>> b.a = a # Set relation *before* saving related object
>>> a.save() # Save related object
>>> a.pk
123
>>> b.a.pk
123
>>> b.save()
>>> b = ModelB.objects.get(name='bar')
>>> b.a
<NoneType: None>

While I agree the docs suggests auto-increment keys can't be determined upfront, the django model above shows a.pk is already known BEFORE I save b. Hence I don't see why Django can't save object b correctly.

Instead, I have to use the below suggested hack in the StackOverlow question. Yet this seems cumbersome and unnecessary.

>>> b.a = b.a
>>> b.save()
Last edited 10 years ago by altlist (previous) (diff)

comment:22 by Aymeric Augustin, 10 years ago

The original example isn't supported because, deep inside the ORM, b.a = a is implemented as b.a_id = a.id.

Changing this design would require rewriting most of the related fields code, which isn't for the faint of heart ;-)

comment:23 by Anssi Kääriäinen, 10 years ago

What we could possibly do is check that b.a.id == b.a_id. If not, then throw an error. This would break some code for sure, but on the other hand it would be a nice protection. Instead of saving corrupted data the ORM layer forces you to write code where both b.a.id and b.a_id agree on the value.

There are a couple of other options, too, like when a.save() is called, then b.a_id is set (requires that a knows it is assigned to b), or on b.save() use the value of b.a.id instead of b.a_id (this will break code that relies on saving b.a_id).

Raising an exception whenever b.a_id doesn't agree on b.a.id's value is worth considering. I believe exception is a lot better than the possibility for bugs that lead to silent data corruption.

comment:24 by Tim Graham, 10 years ago

Resolution: duplicate
Status: newclosed

Looks like a duplicate of #10811.

Note: See TracTickets for help on using tickets.
Back to Top