Opened 13 years ago

Closed 12 years ago

Last modified 12 years ago

#18306 closed Cleanup/optimization (fixed)

Deferred models should automatically issue update_fields when saving

Reported by: Anssi Kääriäinen Owned by: Anssi Kääriäinen
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: niwi@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When saving a model fetched through deferred model loading (.only() or .defer()) and then saving these models it would be useful to issue an update_fields automatically.

Currently, all the model's deferred fields are fetched from the DB and then saved back - this doesn't make sense. Instead, doing

(assume Author with fields age + name)
a = Author.objects.only('name').get(pk=1)
a.name = 'Anssi'
a.save()

should be equivalent to doing:

a = Author.objects.only('name').get(pk=1)
a.name = 'Anssi'
a.save(update_fields=['name'])

In addition, the following must work:

a = Author.objects.only('name').get(pk=1)
a.name = 'Anssi'
a.age = 31
a.save()

both age and name must be saved in this case.

Also, if save is given update_fields argument, it must override the deferred model argument.

I would like to investigate the possibility of storing the fields to update in model._state.update_fields - the reason is that this would be a nice hook for 3rd party model subclasses which could use that variable to implement automatic tracking of changed fields. There is a lot of discussion of this feature in #4102, and the decision there seems clear: we can't include automatic change tracking in Django core. However, allowing easy creation of state-tracking subclasses would be a nice addition.

Change History (23)

comment:1 by anonymous, 13 years ago

The current work is done on this branch: https://github.com/niwibe/django/commits/defer_only_interaction

So far I have not sent the request to pull.

comment:2 by Anssi Kääriäinen, 13 years ago

Has patch: set
Needs documentation: set
Patch needs improvement: set
Triage Stage: UnreviewedAccepted

The implementation looks suspiciously clean... :) However there is going to be some problems to get the updating of manually set fields to work correctly. (the a.age = 31 case in original description). Apart of this problem the patch needs some documentation (only()/defer() docs need a mention of this, maybe also in .save()).

If it turns out to be at all problematic to support the model._state.update_fields attribute lets just leave that out from this ticket. I'd like to try that idea out, but there is no need to complicate this ticket with that idea.

comment:3 by anonymous, 13 years ago

I understand! I will try to implement and integrate, and if integrated well, great for everyone! And if not, then leave it as is.

I have not written any kind of documentation, because I'm still doing tests and I'm playing a bit. When ready, do a pull-request and review the patch!

comment:4 by Andrei Antoukh, 13 years ago

Owner: changed from nobody to Andrei Antoukh
Status: newassigned

I've been testing, and all cases you mention work fine. In a few days I will work on the documentation and other aspects that you mentioned in the issue.

Last edited 13 years ago by Andrei Antoukh (previous) (diff)

comment:5 by Andrei Antoukh, 13 years ago

comment:6 by Anssi Kääriäinen, 13 years ago

Needs documentation: unset
Owner: changed from Andrei Antoukh to Anssi Kääriäinen
Patch needs improvement: unset
Status: assignednew

Looks excellent. I will try to find time to work this into core in the next couple of days.

comment:7 by Anssi Kääriäinen, 13 years ago

Patch needs improvement: set

I spotted two issues:

  1. When doing a deferred model .save() with using argument, where the using is to other database than where the object was fetched, the "update_fields" machinery should not be used. This is a minor corner case, but should still work.
  2. The "deferred fields" is remembered in instance._meta, and it is altered when a field is set to a value - this results in changing the "deferred fields" property for _all_ instances, not just for the instance which was changed. This is a blocker issue.

I see two ways forward:

  • remember the loaded fields (instead of deferred fields) in instance._state instead of instance._meta.
  • do not remember the deferred fields at all. Instead check just for instance._deferred. If set, then go through all fields and check if any of the fields are instances of DeferredAttribute. Those fields which are instances of DeferredAttribute are the deferred fields.

The second approach should be very robust, and I think it should be used. The check to use is:

for field in self._meta.fields:
    if isinstance(self.__class__.__dict__.get(field.attname), DeferredAttribute):
        # add to deferred fields.

See: https://github.com/akaariai/django/tree/ticket_18306 for details. There are also some other added tests.

I stumbled upon other issues while working on this ticket, see https://github.com/akaariai/django/tree/defer_inheritance_pk, #18343. Some (well at least one) of the added tests need this work to pass.

comment:8 by Andrei Antoukh, 13 years ago

You're absolutely right! The state can not be save on _meta attribute!

I use the first aproach (store deferred fields on modelinstance._state.deferred_fields), because the second aproache not works correctlly. self.__class__.__dict__ always contains DeferredAttribute, has been modified the field value or not.

With my changes and your changes merged, fails two tests with unexpected number of queries:

======================================================================
FAIL: test_update_fields_fk_defer (modeltests.update_only_fields.tests.UpdateOnlyFieldsTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/niwi/django/tests/modeltests/update_only_fields/tests.py", line 101, in test_update_fields_fk_defer
    e1.save()
  File "../django/test/testcases.py", line 273, in __exit__
    executed, self.num
AssertionError: 2 queries executed, 1 expected

Generated queries:

(0.000) SELECT "update_only_fields_person"."id", "update_only_fields_employee"."person_ptr_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,)
(0.000) UPDATE "update_only_fields_employee" SET "profile_id" = 2 WHERE "update_only_fields_employee"."person_ptr_id" IN (SELECT U0."person_ptr_id" FROM "update_only_fields_employee" U0 WHERE U0."person_ptr_id" = 1 ); args=(2, 1)
======================================================================
FAIL: test_update_fields_inheritance_defer (modeltests.update_only_fields.tests.UpdateOnlyFieldsTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/niwi/django/tests/modeltests/update_only_fields/tests.py", line 89, in test_update_fields_inheritance_defer
    e1.save()
  File "../django/test/testcases.py", line 273, in __exit__
    executed, self.num
AssertionError: 4 queries executed, 1 expected

----------------------------------------------------------------------

Generated queries:

(0.000) SELECT "update_only_fields_person"."id", "update_only_fields_employee"."person_ptr_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,)
(0.000) UPDATE "update_only_fields_person" SET "name" = Linda WHERE "update_only_fields_person"."id" = 1 ; args=('Linda', 1)
(0.000) SELECT "update_only_fields_employee"."person_ptr_id", "update_only_fields_employee"."profile_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,)
(0.000) UPDATE "update_only_fields_employee" SET "profile_id" = 1 WHERE "update_only_fields_employee"."person_ptr_id" IN (SELECT U0."person_ptr_id" FROM "update_only_fields_employee" U0 WHERE U0."person_ptr_id" = 1 ); args=(1, 1)

I'm not sure it's appropriate behavior. But not for sure what would be appropriate.
All changes are in the branch mentioned in my first comment.

comment:9 by Andrei Antoukh, 13 years ago

Cc: niwi@… added

comment:10 by Anssi Kääriäinen, 13 years ago

Could you check the queries with https://github.com/django/django/pull/80 - it contains some changes which should reduce the need for queries in inheritance cases, and IIRC the tests I added relied on this.

One way to check this is to create a new branch from your local development branch of this feature ("git checkout -b test_with_pull_80" when in the local dev branch), then do "curl https://github.com/django/django/pull/80.patch |git am" and run the tests again.

Don't update your pull request or your local development branch directly with the pull-80 stuff, it should be handled separately.

I don't have time to look into the patch in detail just now.

comment:11 by Andrei Antoukh, 13 years ago

Thank you very much, I will try your changes.

Your review the patch when you have time, there is still enough time for 1.5

comment:12 by Anssi Kääriäinen, 13 years ago

I did a little digging for the reasons of the test failure. The reason is that the .defer/.only code stores field.attname in the skip list, but the update_fields checks for field.name. So, the save code sees 'profile' as a loaded field even if it isn't one, and tries to save it which results in extra queries. There is also one extra query caused by missing the code from pull/80.

There are two problems here - first is that using both field.attname and field.name must work in update_only_fields argument. I don't believe this works correctly currently, at least it isn't tested. This means that both 'profile_id', and 'profile' in update_fields argument must work. This should be handled in separate ticket.

The other problem is that the .defer/.only logic in models/query.py doesn't use consitently the field.attname (or the field.name) when creating the deferred model instance. This must be fixed again in another ticket before fixing this ticket.

So, at this point we have already 3 must fix before tickets. The pull/80 issue is IMHO ready. There is a ticket for the defer/only logic for field.attname (#17485). I created a ticket for the ability to use field.attname in update_fields (ticket #18362).

comment:13 by Andrei Antoukh, 13 years ago

Good work on "Cleaned up deferred model implementation", this fixes a lot of number of queries. But continued to have a strange case:

UPDATE "update_only_fields_person" SET "name" = Linda WHERE "update_only_fields_person"."id" = 1 ; args=('Linda', 1)
SELECT "update_only_fields_employee"."person_ptr_id", "update_only_fields_employee"."profile_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,)
UPDATE "update_only_fields_employee" SET "profile_id" = 1 WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1, 1)

With test code:

def test_update_fields_inheritance_defer(self):
    profile_boss = Profile.objects.create(name='Boss', salary=3000)
    e1 = Employee.objects.create(name='Sara', gender='F',
        employee_num=1, profile=profile_boss)
    e1 = Employee.objects.only('name').get(pk=e1.pk)
    e1.name = 'Linda'
    with self.assertNumQueries(3):
        e1.save()

In my opinion, this case should only run a single query. I will modify the test for 3 queryes expect, but I'm waiting for you comment me here.

Current Status:

  • #18362 now has attached patch
  • #17485 the solucion proposed seems rasonable (I update the patch for latest version)
  • #18343 now fixed and commited to master (great work)

comment:14 by Anssi Kääriäinen, 13 years ago

One query seems like the correct result. I guess for some reason the profile_id isn't seen as a deferred field by the code. Unfortunately I do not have time to look into this more just now.

Thanks for all your work on this!

comment:15 by Andrei Antoukh, 13 years ago

;)

I've been debugging the code, and I found the solution at 3 queries. To test, apply first patch #18362, so that the code is consistent.

When you have time for this, review the code! Do not worry about time!

comment:16 by Claude Paroz <claude@…>, 13 years ago

Resolution: fixed
Status: newclosed

In [4f53e77f0720bd4d74b1b08857db8f7d32c65008]:

Fixed #18306 -- Removed most of GeoDjango-specific deployment docs

comment:17 by Claude Paroz, 13 years ago

Resolution: fixed
Status: closedreopened

Sorry, wrong ticket number in commit message.

comment:18 by Anssi Kääriäinen, 13 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

I have reviewed and polished the patch written by niwi. I created a pull request as I think the feature is now ready for commit. Please review, or at least do a cursory glance for obvious mistakes.

I wonder if I should squash the two commits into one...

I ended up getting rid of the model._state.deferred_fields and instead using the instance's and class' dict to check the deferred state of the object when saving. The idea of using ._state was mine (or at least encouraged by me), sorry for pushing the wrong solution.

comment:19 by Andrei Antoukh, 13 years ago

I liked more the solution keeping not diferred fields on models _state, but also I like this solution. Surely, the new implementation is better;)

Great work!

Last edited 13 years ago by Andrei Antoukh (previous) (diff)

comment:20 by Anssi Kääriäinen, 13 years ago

The problem I faced was that there were multiple places in models/query.py where the deferred_fields would need to be calculated. The original implementation didn't take into account select_related (where get_cached_row() can return deferred instances), and qs.raw(). These just work with the current implementation. In addition, there is no additional state bookkeeping which is always a bonus. One less place for bugs to hide :)

If one wants to do dirty field state tracking, the solution is to use the model instance itself or it's state to keep the dirty field info, and then just overrider the .save() method and use update_fields in there for the dirty fields.

BTW I have been wondering if we should allow 'pk' in the update_fields. If you say "update_fields=['pk']", then the primary key will be updated as part of the query. I know there is no cascade support in Django, but one can always use custom db schemas and ON UPDATE CASCADE. Seems useful to me. But of course not this ticket's problem...

comment:21 by Andrei Antoukh, 13 years ago

Interesting! did not have in mind such cases.

Of course, the current implementation is better and takes a lot of duplication of code.

Thanks! ;)

comment:22 by Anssi Kääriäinen <akaariai@…>, 12 years ago

Resolution: fixed
Status: reopenedclosed

In [99321e30cebbffeafc6ae19f4f92a0a665cbf19b]:

Fixed #18306 -- Made deferred models issue update_fields on save

Deferred models now automatically update only the fields which are
loaded from the db (with .only() or .defer()). In addition, any field
set manually after the load is updated on save.

comment:23 by Anssi Kääriäinen <akaariai@…>, 12 years ago

In dad7eec6e1c1770f5d81d5c5ed2de296c1eca969:

Corrected links to only()/defer() in Model documentation

Refs #18306

Note: See TracTickets for help on using tickets.
Back to Top