#22046 closed Bug (wontfix)
unhelpful queryset handling for model formsets with data
Reported by: | Jim Bailey | Owned by: | |
---|---|---|---|
Component: | Forms | Version: | 1.6 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
When a model formset is constructed with data there are two related issues with the queryset handling.
The first is a performance issue - if a queryset is not specified then the _object_dict cache gets filled with every model instance in the database even if the data only deals with a subset of the instances.
The second problem can happen when a queryset is specified, and can lead to unexpected duplication of records in the database. If the specified queryset does not cover all the instances specified in the data, then the missmatched instances are assumed to be new, their pk is ignored, and they are written as new instances on save.
For example, a user has their pagination set to 2 items per page. There are the following Authors in the system:
(1, 'a')
(2, 'c')
(3, 'e')
(4, 'g')
They load the second page of authors sorted by name so the formset has details for (3, 'e') and (4, 'g').
Another user then adds another author (5, 'b').
The first user then saves the page without changes. The model formset is contructed with the data (3, 'e') and (4, 'g'), but also a queryset for the second page of authors sorted by name, which now contains (2, 'c') and (3, 'e').
The records with pk 3 is found and updated as expected. The record with pk 4 is not found in the queryset, so a new
Author instance is created: (6, 'g').
There are now two authors with the name 'g'.
I think the solution to both issues is the same. When a model formset is constructed with data the only records that should be loaded from the database and operated on are those specified by the pks in the data. The queryset used to pull in that data can and should be constructed automatically from the data. Any queryset specified in the constructor should be ignored.
Change History (10)
comment:1 by , 11 years ago
Has patch: | set |
---|
comment:2 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 11 years ago
Easy pickings: | unset |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
The pull request does not address the second problem described in this ticket. The pull request needs more work and testing. See the attached test, which reproduces the second problem as described.
def test_model_formset_with_paginated_form_data(self): # saving a formset that contains a subset of the view queryset should # not overwrite existing data with different pks. # see #22046 Author.objects.create(pk=1, name='a') Author.objects.create(pk=2, name='c') Author.objects.create(pk=3, name='e') Author.objects.create(pk=4, name='g') data = { 'form-TOTAL_FORMS': 2, 'form-INITIAL_FORMS': 2, 'form-MAX_NUM_FORMS': 2, 'form-0-id': '3', 'form-0-name': 'c', 'form-1-id': '4', 'form-1-name': 'g', } FormSet = modelformset_factory(Author, fields=('name', )) formset = FormSet(data=data, queryset=Author.objects.order_by('name')) # adding another author before saving the form exposes the issue Author.objects.create(pk=5, name='b') formset.save() # expect only one author named `c' self.assertEqual(1, Author.objects.filter(name='c').count())
comment:4 by , 11 years ago
Hello, thank you very much for taking a look.
I do not agree with the expectation stated in test_model_formset_with_paginated_form_data(). What I would expect is:
Before the form is saved:
pk | name ---+----- 1 | a 2 | c 3 | e 4 | g 5 | b
The form data specifies that the record with id/pk 3 should be updated to have name 'c' and the record with id/pk 4 should be updated to have name 'g', so after the form is saved:
pk | name ---+----- 1 | a 2 | c 3 | c 4 | g 5 | b
So there should now be two authors named 'c'.
I don't think this new test really touches on any of the issues raised by this ticket as the specified queryset does contain all the records present in the form. The order is irrelevant. This is why this test fails both with and without my patch, and if it is corrected (there actually should be two 'c' authors) then it passes both with and without my patch.
I feel that test_model_formset_with_mismatched_queryset() adequately tests problem 2 - that the correct record is updated (instead of a new one being created) despite the specified queryset not covering the form's record.
comment:5 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:6 by , 11 years ago
Jim,
This is my mistake. The data section for my test should have been:
data = { 'form-TOTAL_FORMS': 2, 'form-INITIAL_FORMS': 2, 'form-MAX_NUM_FORMS': 2, 'form-0-id': '3', 'form-0-name': 'e', # this should have been e, not c 'form-1-id': '4', 'form-1-name': 'g', }
I will update and check again.
comment:7 by , 11 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:8 by , 10 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
While the security model for formsets is currently somewhat ill-defined, I don't think a security model of "if you expose a formset for a model class, every instance of that model class is available for editing by anyone with access to POST to that formset" is what we want.
comment:9 by , 9 years ago
@timgraham
I accept your last comment, but I don't see how that results in a WONTFIX for this bug. As far as I can see, it's a real bug - when you save a model formset, and the underlying queryset returns different rows from what it did previously (which can happen on various insertions/deletions/changes), then you are going to get very unexpected behaviour.
AFAICS, the logic of BaseModelFormSet._existing_object
is only correct if the queryset passed in to the formset contains all applicable records both times.
There is also the security concern that it is currently very easy to get this wrong. If you forget to pass in the queryset argument to the formset when you are POSTing, then you will have a major security issue, because the user can specify any ID they like and update the record.
comment:10 by , 9 years ago
The fragility of the current approach of matching data to instances in the database seems to be tracked in #9076, #15574, and possibly others. I guess we could reclassify this as a duplicate if you feel that's better (wontfix seems to be mainly about the approach proposed here) or clarify how this ticket is different from those others.
To address the security concern you brought up, I imagine would could start a deprecation path toward requiring queryset
to be specified, similar to how we now require ModelForm
's Meta.fields
or exclude
to be specified. Shall we open a ticket for that?
I have added unit tests and a patch on this branch:
https://github.com/dgym/django/tree/ticket_22046
I have also created a pull request here:
https://github.com/django/django/pull/2277