Code

Opened 2 years ago

Closed 2 years ago

Last modified 22 months ago

#18306 closed Cleanup/optimization (fixed)

Deferred models should automatically issue update_fields when saving

Reported by: akaariai Owned by: akaariai
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: niwi@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When saving a model fetched through deferred model loading (.only() or .defer()) and then saving these models it would be useful to issue an update_fields automatically.

Currently, all the model's deferred fields are fetched from the DB and then saved back - this doesn't make sense. Instead, doing

(assume Author with fields age + name)
a = Author.objects.only('name').get(pk=1)
a.name = 'Anssi'
a.save()

should be equivalent to doing:

a = Author.objects.only('name').get(pk=1)
a.name = 'Anssi'
a.save(update_fields=['name'])

In addition, the following must work:

a = Author.objects.only('name').get(pk=1)
a.name = 'Anssi'
a.age = 31
a.save()

both age and name must be saved in this case.

Also, if save is given update_fields argument, it must override the deferred model argument.

I would like to investigate the possibility of storing the fields to update in model._state.update_fields - the reason is that this would be a nice hook for 3rd party model subclasses which could use that variable to implement automatic tracking of changed fields. There is a lot of discussion of this feature in #4102, and the decision there seems clear: we can't include automatic change tracking in Django core. However, allowing easy creation of state-tracking subclasses would be a nice addition.

Attachments (0)

Change History (23)

comment:1 Changed 2 years ago by anonymous

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

The current work is done on this branch: https://github.com/niwibe/django/commits/defer_only_interaction

So far I have not sent the request to pull.

comment:2 Changed 2 years ago by akaariai

  • Has patch set
  • Needs documentation set
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

The implementation looks suspiciously clean... :) However there is going to be some problems to get the updating of manually set fields to work correctly. (the a.age = 31 case in original description). Apart of this problem the patch needs some documentation (only()/defer() docs need a mention of this, maybe also in .save()).

If it turns out to be at all problematic to support the model._state.update_fields attribute lets just leave that out from this ticket. I'd like to try that idea out, but there is no need to complicate this ticket with that idea.

comment:3 Changed 2 years ago by anonymous

I understand! I will try to implement and integrate, and if integrated well, great for everyone! And if not, then leave it as is.

I have not written any kind of documentation, because I'm still doing tests and I'm playing a bit. When ready, do a pull-request and review the patch!

comment:4 Changed 2 years ago by niwi

  • Owner changed from nobody to niwi
  • Status changed from new to assigned

I've been testing, and all cases you mention work fine. In a few days I will work on the documentation and other aspects that you mentioned in the issue.

Last edited 2 years ago by niwi (previous) (diff)

comment:5 Changed 2 years ago by niwi

comment:6 Changed 2 years ago by akaariai

  • Needs documentation unset
  • Owner changed from niwi to akaariai
  • Patch needs improvement unset
  • Status changed from assigned to new

Looks excellent. I will try to find time to work this into core in the next couple of days.

comment:7 Changed 2 years ago by akaariai

  • Patch needs improvement set

I spotted two issues:

  1. When doing a deferred model .save() with using argument, where the using is to other database than where the object was fetched, the "update_fields" machinery should not be used. This is a minor corner case, but should still work.
  2. The "deferred fields" is remembered in instance._meta, and it is altered when a field is set to a value - this results in changing the "deferred fields" property for _all_ instances, not just for the instance which was changed. This is a blocker issue.

I see two ways forward:

  • remember the loaded fields (instead of deferred fields) in instance._state instead of instance._meta.
  • do not remember the deferred fields at all. Instead check just for instance._deferred. If set, then go through all fields and check if any of the fields are instances of DeferredAttribute. Those fields which are instances of DeferredAttribute are the deferred fields.

The second approach should be very robust, and I think it should be used. The check to use is:

for field in self._meta.fields:
    if isinstance(self.__class__.__dict__.get(field.attname), DeferredAttribute):
        # add to deferred fields.

See: https://github.com/akaariai/django/tree/ticket_18306 for details. There are also some other added tests.

I stumbled upon other issues while working on this ticket, see https://github.com/akaariai/django/tree/defer_inheritance_pk, #18343. Some (well at least one) of the added tests need this work to pass.

comment:8 Changed 2 years ago by niwi

You're absolutely right! The state can not be save on _meta attribute!

I use the first aproach (store deferred fields on modelinstance._state.deferred_fields), because the second aproache not works correctlly. self.__class__.__dict__ always contains DeferredAttribute, has been modified the field value or not.

With my changes and your changes merged, fails two tests with unexpected number of queries:

======================================================================
FAIL: test_update_fields_fk_defer (modeltests.update_only_fields.tests.UpdateOnlyFieldsTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/niwi/django/tests/modeltests/update_only_fields/tests.py", line 101, in test_update_fields_fk_defer
    e1.save()
  File "../django/test/testcases.py", line 273, in __exit__
    executed, self.num
AssertionError: 2 queries executed, 1 expected

Generated queries:

(0.000) SELECT "update_only_fields_person"."id", "update_only_fields_employee"."person_ptr_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,)
(0.000) UPDATE "update_only_fields_employee" SET "profile_id" = 2 WHERE "update_only_fields_employee"."person_ptr_id" IN (SELECT U0."person_ptr_id" FROM "update_only_fields_employee" U0 WHERE U0."person_ptr_id" = 1 ); args=(2, 1)
======================================================================
FAIL: test_update_fields_inheritance_defer (modeltests.update_only_fields.tests.UpdateOnlyFieldsTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/niwi/django/tests/modeltests/update_only_fields/tests.py", line 89, in test_update_fields_inheritance_defer
    e1.save()
  File "../django/test/testcases.py", line 273, in __exit__
    executed, self.num
AssertionError: 4 queries executed, 1 expected

----------------------------------------------------------------------

Generated queries:

(0.000) SELECT "update_only_fields_person"."id", "update_only_fields_employee"."person_ptr_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,)
(0.000) UPDATE "update_only_fields_person" SET "name" = Linda WHERE "update_only_fields_person"."id" = 1 ; args=('Linda', 1)
(0.000) SELECT "update_only_fields_employee"."person_ptr_id", "update_only_fields_employee"."profile_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,)
(0.000) UPDATE "update_only_fields_employee" SET "profile_id" = 1 WHERE "update_only_fields_employee"."person_ptr_id" IN (SELECT U0."person_ptr_id" FROM "update_only_fields_employee" U0 WHERE U0."person_ptr_id" = 1 ); args=(1, 1)

I'm not sure it's appropriate behavior. But not for sure what would be appropriate.
All changes are in the branch mentioned in my first comment.

comment:9 Changed 2 years ago by niwi

  • Cc niwi@… added

comment:10 Changed 2 years ago by akaariai

Could you check the queries with https://github.com/django/django/pull/80 - it contains some changes which should reduce the need for queries in inheritance cases, and IIRC the tests I added relied on this.

One way to check this is to create a new branch from your local development branch of this feature ("git checkout -b test_with_pull_80" when in the local dev branch), then do "curl https://github.com/django/django/pull/80.patch |git am" and run the tests again.

Don't update your pull request or your local development branch directly with the pull-80 stuff, it should be handled separately.

I don't have time to look into the patch in detail just now.

comment:11 Changed 2 years ago by niwi

Thank you very much, I will try your changes.

Your review the patch when you have time, there is still enough time for 1.5

comment:12 Changed 2 years ago by akaariai

I did a little digging for the reasons of the test failure. The reason is that the .defer/.only code stores field.attname in the skip list, but the update_fields checks for field.name. So, the save code sees 'profile' as a loaded field even if it isn't one, and tries to save it which results in extra queries. There is also one extra query caused by missing the code from pull/80.

There are two problems here - first is that using both field.attname and field.name must work in update_only_fields argument. I don't believe this works correctly currently, at least it isn't tested. This means that both 'profile_id', and 'profile' in update_fields argument must work. This should be handled in separate ticket.

The other problem is that the .defer/.only logic in models/query.py doesn't use consitently the field.attname (or the field.name) when creating the deferred model instance. This must be fixed again in another ticket before fixing this ticket.

So, at this point we have already 3 must fix before tickets. The pull/80 issue is IMHO ready. There is a ticket for the defer/only logic for field.attname (#17485). I created a ticket for the ability to use field.attname in update_fields (ticket #18362).

comment:13 Changed 2 years ago by niwi

Good work on "Cleaned up deferred model implementation", this fixes a lot of number of queries. But continued to have a strange case:

UPDATE "update_only_fields_person" SET "name" = Linda WHERE "update_only_fields_person"."id" = 1 ; args=('Linda', 1)
SELECT "update_only_fields_employee"."person_ptr_id", "update_only_fields_employee"."profile_id" FROM "update_only_fields_employee" INNER JOIN "update_only_fields_person" ON ("update_only_fields_employee"."person_ptr_id" = "update_only_fields_person"."id") WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1,)
UPDATE "update_only_fields_employee" SET "profile_id" = 1 WHERE "update_only_fields_employee"."person_ptr_id" = 1 ; args=(1, 1)

With test code:

def test_update_fields_inheritance_defer(self):
    profile_boss = Profile.objects.create(name='Boss', salary=3000)
    e1 = Employee.objects.create(name='Sara', gender='F',
        employee_num=1, profile=profile_boss)
    e1 = Employee.objects.only('name').get(pk=e1.pk)
    e1.name = 'Linda'
    with self.assertNumQueries(3):
        e1.save()

In my opinion, this case should only run a single query. I will modify the test for 3 queryes expect, but I'm waiting for you comment me here.

Current Status:

  • #18362 now has attached patch
  • #17485 the solucion proposed seems rasonable (I update the patch for latest version)
  • #18343 now fixed and commited to master (great work)

comment:14 Changed 2 years ago by akaariai

One query seems like the correct result. I guess for some reason the profile_id isn't seen as a deferred field by the code. Unfortunately I do not have time to look into this more just now.

Thanks for all your work on this!

comment:15 Changed 2 years ago by niwi

;)

I've been debugging the code, and I found the solution at 3 queries. To test, apply first patch #18362, so that the code is consistent.

When you have time for this, review the code! Do not worry about time!

comment:16 Changed 2 years ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In [4f53e77f0720bd4d74b1b08857db8f7d32c65008]:

Fixed #18306 -- Removed most of GeoDjango-specific deployment docs

comment:17 Changed 2 years ago by claudep

  • Resolution fixed deleted
  • Status changed from closed to reopened

Sorry, wrong ticket number in commit message.

comment:18 Changed 2 years ago by akaariai

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

I have reviewed and polished the patch written by niwi. I created a pull request as I think the feature is now ready for commit. Please review, or at least do a cursory glance for obvious mistakes.

I wonder if I should squash the two commits into one...

I ended up getting rid of the model._state.deferred_fields and instead using the instance's and class' dict to check the deferred state of the object when saving. The idea of using ._state was mine (or at least encouraged by me), sorry for pushing the wrong solution.

comment:19 Changed 2 years ago by niwi

I liked more the solution keeping not diferred fields on models _state, but also I like this solution. Surely, the new implementation is better;)

Great work!

Last edited 2 years ago by niwi (previous) (diff)

comment:20 Changed 2 years ago by akaariai

The problem I faced was that there were multiple places in models/query.py where the deferred_fields would need to be calculated. The original implementation didn't take into account select_related (where get_cached_row() can return deferred instances), and qs.raw(). These just work with the current implementation. In addition, there is no additional state bookkeeping which is always a bonus. One less place for bugs to hide :)

If one wants to do dirty field state tracking, the solution is to use the model instance itself or it's state to keep the dirty field info, and then just overrider the .save() method and use update_fields in there for the dirty fields.

BTW I have been wondering if we should allow 'pk' in the update_fields. If you say "update_fields=['pk']", then the primary key will be updated as part of the query. I know there is no cascade support in Django, but one can always use custom db schemas and ON UPDATE CASCADE. Seems useful to me. But of course not this ticket's problem...

comment:21 Changed 2 years ago by niwi

Interesting! did not have in mind such cases.

Of course, the current implementation is better and takes a lot of duplication of code.

Thanks! ;)

comment:22 Changed 2 years ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from reopened to closed

In [99321e30cebbffeafc6ae19f4f92a0a665cbf19b]:

Fixed #18306 -- Made deferred models issue update_fields on save

Deferred models now automatically update only the fields which are
loaded from the db (with .only() or .defer()). In addition, any field
set manually after the load is updated on save.

comment:23 Changed 22 months ago by Anssi Kääriäinen <akaariai@…>

In dad7eec6e1c1770f5d81d5c5ed2de296c1eca969:

Corrected links to only()/defer() in Model documentation

Refs #18306

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.