Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#25693 closed Bug (fixed)

Data loss if a ManyToManyField is shadowed by Prefetch

Reported by: Ian Foote Owned by: Ian Foote
Component: Database layer (models, ORM) Version: 1.7
Severity: Release blocker Keywords:
Cc: tom@…, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Ian Foote)

With two models:

class Book(Model):
    name = CharField(max_length=100)

class Author(Model):
    books = ManyToManyField(Book)

it is possible to create a prefetch query that loses data from Author.books:

poems = Book.objects.filter(name='Poems')
Author.objects.prefetch_related(
    Prefetch('books', queryset=poems, to_attr='books'),
)

When this queryset is evaluated, each Author's books is overridden by the poems queryset.

Change History (24)

comment:1 by Ian Foote, 8 years ago

Owner: changed from nobody to Ian Foote
Status: newassigned

comment:2 by Ian Foote, 8 years ago

Description: modified (diff)

comment:3 by Tom Christie, 8 years ago

Cc: tom@… added

comment:4 by Tom Christie, 8 years ago

Realities downstream issue: https://github.com/tomchristie/django-rest-framework/issues/2442

I assume this ticket is a duplicate of the (closed, wontfix) ticket here: https://code.djangoproject.com/ticket/21584

Version 0, edited 8 years ago by Tom Christie (next)

comment:5 by Aymeric Augustin, 8 years ago

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

I think these are different issues.


#21584 demonstrates getting an object, prefetching related objects, creating an additional related object: the list of prefetched related objects isn't refreshed automatically. (The description makes it look more complicated than it actually is.)

That's because parent.children_set is just a queryset. It's consistent with how querysets have always been working. There is exactly the same behavior without prefetech_related, with any queryset for which the results have been fetched.

The DRF issue Tom mentions is probably a consequence of this. I believe DRF should refetch data that may have been invalidated by the updates before serializing its response.


Now, this bug appears to be much worse becase incorrect data gets written back to the database (if I understand correctly).

Ian, do you know since which version of Django it occurs? I'm wondering if we can but controls in place to avoid assigning prefetched querysets to writable descriptors or preventing the write from occurring...

comment:6 by Marten Kenbeek, 8 years ago

This seems related to #25550. Would a backport fix this bug?

comment:7 by Ian Foote, 8 years ago

I've verified it on master and on 1.8. I assume it also applies to 1.9.

comment:8 by Simon Charette, 8 years ago

Cc: Simon Charette added

This issue also recently surfaced on the django-users mailing list.

Now, this bug appears to be much worse becase incorrect data gets written back to the database (if I understand correctly).

You do.

Ian, do you know since which version of Django it occurs? I'm wondering if we can but controls in place to avoid assigning prefetched querysets to writable descriptors or preventing the write from occurring...

This bug exists since the introduction of the Prefetch object (Django 1.7).

This seems related to #25550. Would a backport fix this bug?

The ticket is related but the issue will remain until the assignment is actually removed. It's currently only pending deprecation on master.

Since we'll have to live with this assignment issue until 2.0 the only viable solution I can think of at the moment would be to raise a ValueError if getattr(queryset.model, to_attr) is an instance of one of the problematic descriptors.

comment:9 by Simon Charette, 8 years ago

Note that the issue is even worse with GenericRelation on Django < 1.9 which used to issue a manager.clear() instead of relying on the revised logic added by #21169.

comment:10 by Ian Foote, 8 years ago

Has patch: set

comment:11 by Simon Charette, 8 years ago

Needs documentation: set
Patch needs improvement: set
Version: master1.7

The patch is looking good. It's only missing a release note for the 1.8 and 1.7 series.

comment:12 by Ian Foote, 8 years ago

Needs documentation: unset
Patch needs improvement: unset

comment:13 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In 46085737:

Fixed #25693 -- Prevented data loss with Prefetch and ManyToManyField.

Thanks to Jamie Matthews for finding and explaining the bug.

comment:14 by Tim Graham <timograham@…>, 8 years ago

In f9a08eb:

[1.9.x] Fixed #25693 -- Prevented data loss with Prefetch and ManyToManyField.

Thanks to Jamie Matthews for finding and explaining the bug.

Backport of 4608573788c04fc047da42b4b7b48fdee8136ad3 from master

comment:15 by Tim Graham <timograham@…>, 8 years ago

In 5fc9a1b8:

[1.8.x] Fixed #25693 -- Prevented data loss with Prefetch and ManyToManyField.

Thanks to Jamie Matthews for finding and explaining the bug.

Backport of 4608573788c04fc047da42b4b7b48fdee8136ad3 from master

comment:16 by Simon Charette <charette.s@…>, 8 years ago

In 6184cb9b:

[1.7.x] Fixed #25693 -- Prevented data loss with Prefetch and ManyToManyField.

Thanks to Jamie Matthews for finding and explaining the bug.

Backport of 4608573788c04fc047da42b4b7b48fdee8136ad3 from master

comment:17 by Simon Charette <charette.s@…>, 8 years ago

In 4a9c32f:

Refs #25693 -- Avoided redundant calls to get_fields() in to_attr validation.

comment:18 by Simon Charette <charette.s@…>, 8 years ago

In cc8c02fa:

Refs #25693 -- Added a regression test for to_attr validation on forward m2m.

comment:19 by Simon Charette <charette.s@…>, 8 years ago

In 0d1d307:

[1.9.x] Refs #25693 -- Avoided redundant calls to get_fields() in to_attr validation.

Backport of 4a9c32f5eece9030c2b568e930cec0c1ba8f1da0 from master

comment:20 by Simon Charette <charette.s@…>, 8 years ago

In 164cbdac:

[1.9.x] Refs #25693 -- Added a regression test for to_attr validation on forward m2m.

Backport of cc8c02fa0fa2119704d1c39ca8509850aef84acc from master

comment:21 by Simon Charette <charette.s@…>, 8 years ago

In a3baee2:

[1.8.x] Refs #25693 -- Avoided redundant calls to get_fields() in to_attr validation.

Backport of 4a9c32f5eece9030c2b568e930cec0c1ba8f1da0 from master

comment:22 by Simon Charette <charette.s@…>, 8 years ago

In ae461380:

[1.8.x] Refs #25693 -- Added a regression test for to_attr validation on forward m2m.

Backport of cc8c02fa0fa2119704d1c39ca8509850aef84acc from master

comment:23 by Simon Charette <charette.s@…>, 8 years ago

In 3d037b9:

[1.7.x] Refs #25693 -- Avoided redundant calls to get_fields() in to_attr validation.

Conflicts:

django/db/models/query.py

Backport of cc8c02fa0fa2119704d1c39ca8509850aef84acc from master

comment:24 by Simon Charette <charette.s@…>, 8 years ago

In fd14265:

[1.7.x] Refs #25693 -- Added a regression test for to_attr validation on forward m2m.

Backport of cc8c02fa0fa2119704d1c39ca8509850aef84acc from master

Note: See TracTickets for help on using tickets.
Back to Top