#18306 closed Cleanup/optimization (fixed)
Deferred models should automatically issue update_fields when saving
Reported by: | Anssi Kääriäinen | Owned by: | Anssi Kääriäinen |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | niwi@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When saving a model fetched through deferred model loading (.only() or .defer()) and then saving these models it would be useful to issue an update_fields automatically.
Currently, all the model's deferred fields are fetched from the DB and then saved back - this doesn't make sense. Instead, doing
(assume Author with fields age + name) a = Author.objects.only('name').get(pk=1) a.name = 'Anssi' a.save()
should be equivalent to doing:
a = Author.objects.only('name').get(pk=1) a.name = 'Anssi' a.save(update_fields=['name'])
In addition, the following must work:
a = Author.objects.only('name').get(pk=1) a.name = 'Anssi' a.age = 31 a.save()
both age and name must be saved in this case.
Also, if save is given update_fields argument, it must override the deferred model argument.
I would like to investigate the possibility of storing the fields to update in model._state.update_fields - the reason is that this would be a nice hook for 3rd party model subclasses which could use that variable to implement automatic tracking of changed fields. There is a lot of discussion of this feature in #4102, and the decision there seems clear: we can't include automatic change tracking in Django core. However, allowing easy creation of state-tracking subclasses would be a nice addition.
Change History (23)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
Patch needs improvement: | set |
Triage Stage: | Unreviewed → Accepted |
The implementation looks suspiciously clean... :) However there is going to be some problems to get the updating of manually set fields to work correctly. (the a.age = 31 case in original description). Apart of this problem the patch needs some documentation (only()/defer() docs need a mention of this, maybe also in .save()).
If it turns out to be at all problematic to support the model._state.update_fields attribute lets just leave that out from this ticket. I'd like to try that idea out, but there is no need to complicate this ticket with that idea.
comment:3 by , 13 years ago
I understand! I will try to implement and integrate, and if integrated well, great for everyone! And if not, then leave it as is.
I have not written any kind of documentation, because I'm still doing tests and I'm playing a bit. When ready, do a pull-request and review the patch!
comment:4 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I've been testing, and all cases you mention, and work fine. In a few days I will work on the documentation and other aspects that you mentioned in the issue.
comment:6 by , 13 years ago
Needs documentation: | unset |
---|---|
Owner: | changed from | to
Patch needs improvement: | unset |
Status: | assigned → new |
Looks excellent. I will try to find time to work this into core in the next couple of days.
comment:7 by , 13 years ago
Patch needs improvement: | set |
---|
I spotted two issues:
- When doing a deferred model .save() with using argument, where the using is to other database than where the object was fetched, the "update_fields" machinery should not be used. This is a minor corner case, but should still work.
- The "deferred fields" is remembered in instance._meta, and it is altered when a field is set to a value - this results in changing the "deferred fields" property for _all_ instances, not just for the instance which was changed. This is a blocker issue.
I see two ways forward:
- remember the loaded fields (instead of deferred fields) in instance._state instead of instance._meta.
- do not remember the deferred fields at all. Instead check just for instance._deferred. If set, then go through all fields and check if any of the fields are instances of DeferredAttribute. Those fields which are instances of DeferredAttribute are the deferred fields.
The second approach should be very robust, and I think it should be used. The check to use is:
for field in self._meta.fields: if isinstance(self.__class__.__dict__.get(field.attname), DeferredAttribute): # add to deferred fields.
See: https://github.com/akaariai/django/tree/ticket_18306 for details. There are also some other added tests.
I stumbled upon other issues while working on this ticket, see https://github.com/akaariai/django/tree/defer_inheritance_pk, #18343. Some (well at least one) of the added tests need this work to pass.
comment:8 by , 13 years ago
You're absolutely right! The state can not be save on _meta attribute!
I use the first aproach (store deferred fields on modelinstance._state.deferred_fields
), because the second aproache not works correctlly. self.__class__.__dict__
always contains DeferredAttribute
, has been modified the field value or not.
With my changes and your changes merged, fails two tests with unexpected number of queries:
====================================================================== FAIL: test_update_fields_fk_defer (modeltests.update_only_fields.tests.UpdateOnlyFieldsTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/niwi/django/tests/modeltests/update_only_fields/tests.py", line 101, in test_update_fields_fk_defer e1.save() File "../django/test/testcases.py", line 273, in __exit__ executed, self.num AssertionError: 2 queries executed, 1 expected
Generated queries:
(0.000) SELECT "update_only_fields_person"."id", "update_only_fields_employee"."person_ptr_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,) (0.000) UPDATE "update_only_fields_employee" SET "profile_id" = 2 WHERE "update_only_fields_employee"."person_ptr_id" IN (SELECT U0."person_ptr_id" FROM "update_only_fields_employee" U0 WHERE U0."person_ptr_id" = 1 ); args=(2, 1)
====================================================================== FAIL: test_update_fields_inheritance_defer (modeltests.update_only_fields.tests.UpdateOnlyFieldsTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/niwi/django/tests/modeltests/update_only_fields/tests.py", line 89, in test_update_fields_inheritance_defer e1.save() File "../django/test/testcases.py", line 273, in __exit__ executed, self.num AssertionError: 4 queries executed, 1 expected ----------------------------------------------------------------------
Generated queries:
(0.000) SELECT "update_only_fields_person"."id", "update_only_fields_employee"."person_ptr_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,) (0.000) UPDATE "update_only_fields_person" SET "name" = Linda WHERE "update_only_fields_person"."id" = 1 ; args=('Linda', 1) (0.000) SELECT "update_only_fields_employee"."person_ptr_id", "update_only_fields_employee"."profile_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,) (0.000) UPDATE "update_only_fields_employee" SET "profile_id" = 1 WHERE "update_only_fields_employee"."person_ptr_id" IN (SELECT U0."person_ptr_id" FROM "update_only_fields_employee" U0 WHERE U0."person_ptr_id" = 1 ); args=(1, 1)
I'm not sure it's appropriate behavior. But not for sure what would be appropriate.
All changes are in the branch mentioned in my first comment.
comment:9 by , 13 years ago
Cc: | added |
---|
comment:10 by , 13 years ago
Could you check the queries with https://github.com/django/django/pull/80 - it contains some changes which should reduce the need for queries in inheritance cases, and IIRC the tests I added relied on this.
One way to check this is to create a new branch from your local development branch of this feature ("git checkout -b test_with_pull_80" when in the local dev branch), then do "curl https://github.com/django/django/pull/80.patch |git am" and run the tests again.
Don't update your pull request or your local development branch directly with the pull-80 stuff, it should be handled separately.
I don't have time to look into the patch in detail just now.
comment:11 by , 13 years ago
Thank you very much, I will try your changes.
Your review the patch when you have time, there is still enough time for 1.5
comment:12 by , 13 years ago
I did a little digging for the reasons of the test failure. The reason is that the .defer/.only code stores field.attname in the skip list, but the update_fields checks for field.name. So, the save code sees 'profile' as a loaded field even if it isn't one, and tries to save it which results in extra queries. There is also one extra query caused by missing the code from pull/80.
There are two problems here - first is that using both field.attname and field.name must work in update_only_fields argument. I don't believe this works correctly currently, at least it isn't tested. This means that both 'profile_id', and 'profile' in update_fields argument must work. This should be handled in separate ticket.
The other problem is that the .defer/.only logic in models/query.py doesn't use consitently the field.attname (or the field.name) when creating the deferred model instance. This must be fixed again in another ticket before fixing this ticket.
So, at this point we have already 3 must fix before tickets. The pull/80 issue is IMHO ready. There is a ticket for the defer/only logic for field.attname (#17485). I created a ticket for the ability to use field.attname in update_fields (ticket #18362).
comment:13 by , 12 years ago
Good work on "Cleaned up deferred model implementation", this fixes a lot of number of queries. But continued to have a strange case:
UPDATE "update_only_fields_person" SET "name" = Linda WHERE "update_only_fields_person"."id" = 1 ; args=('Linda', 1) SELECT "update_only_fields_employee"."person_ptr_id", "update_only_fields_employee"."profile_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,) UPDATE "update_only_fields_employee" SET "profile_id" = 1 WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1, 1)
With test code:
def test_update_fields_inheritance_defer(self): profile_boss = Profile.objects.create(name='Boss', salary=3000) e1 = Employee.objects.create(name='Sara', gender='F', employee_num=1, profile=profile_boss) e1 = Employee.objects.only('name').get(pk=e1.pk) e1.name = 'Linda' with self.assertNumQueries(3): e1.save()
In my opinion, this case should only run a single query. I will modify the test for 3 queryes expect, but I'm waiting for you comment me here.
Current Status:
comment:14 by , 12 years ago
One query seems like the correct result. I guess for some reason the profile_id isn't seen as a deferred field by the code. Unfortunately I do not have time to look into this more just now.
Thanks for all your work on this!
comment:15 by , 12 years ago
;)
I've been debugging the code, and I found the solution at 3 queries. To test, apply first patch #18362, so that the code is consistent.
When you have time for this, review the code! Do not worry about time!
comment:16 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:17 by , 12 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Sorry, wrong ticket number in commit message.
comment:18 by , 12 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
I have reviewed and polished the patch written by niwi. I created a pull request as I think the feature is now ready for commit. Please review, or at least do a cursory glance for obvious mistakes.
I wonder if I should squash the two commits into one...
I ended up getting rid of the model._state.deferred_fields and instead using the instance's and class' dict to check the deferred state of the object when saving. The idea of using ._state was mine (or at least encouraged by me), sorry for pushing the wrong solution.
comment:19 by , 12 years ago
I liked more the solution keeping not diferred fields on models _state
, but also I like this solution. Surely, the new implementation is better;)
Great work!
comment:20 by , 12 years ago
The problem I faced was that there were multiple places in models/query.py where the deferred_fields would need to be calculated. The original implementation didn't take into account select_related (where get_cached_row() can return deferred instances), and qs.raw(). These just work with the current implementation. In addition, there is no additional state bookkeeping which is always a bonus. One less place for bugs to hide :)
If one wants to do dirty field state tracking, the solution is to use the model instance itself or it's state to keep the dirty field info, and then just overrider the .save() method and use update_fields in there for the dirty fields.
BTW I have been wondering if we should allow 'pk' in the update_fields. If you say "update_fields=['pk']"
, then the primary key will be updated as part of the query. I know there is no cascade support in Django, but one can always use custom db schemas and ON UPDATE CASCADE. Seems useful to me. But of course not this ticket's problem...
comment:21 by , 12 years ago
Interesting! did not have in mind such cases.
Of course, the current implementation is better and takes a lot of duplication of code.
Thanks! ;)
comment:22 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
The current work is done on this branch: https://github.com/niwibe/django/commits/defer_only_interaction
So far I have not sent the request to pull.