Code

Opened 7 years ago

Closed 2 years ago

Last modified 8 months ago

#4102 closed New feature (fixed)

Allow UPDATE of only specific fields in model.save()

Reported by: Collin Grady <cgrady@…> Owned by: cgrady
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: update fields sql row table modified
Cc: yannvr@…, simon@…, andrewbadr.etc@…, dcramer@…, philippe.r@…, hv@…, jashugan@…, gav@…, richard.davies@…, omat@…, hwaara@…, andrehcampos@…, miracle2k, graham@…, nikitka@…, wwbj2r68n2@…, alexkoshelev, bberes, cg@…, simon@…, maxirobaina@…, kmike84@…, sfllaw@…, anssi.kaariainen@…, dtran, Kronuz, mattlong, bitwolaiye, inactivist@…, niwi@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

At present, django updates every single field when saving - this can be less than desirable performance-wise if you only need to update 1 or 2, and can help avoid some minor threading issues. (two functions that never edit the same columns can never conflict with each other using this)

This patch allows you to specify which columns should be saved, and the rest are left as they currently are in the database.

Example:

>>> p = Poll.objects.get(pk=1)
>>> p.question = "How can we help you?"
>>> p.save(['question']) # or p.save(save_fields=['question'])

The SQL generated is then only touching the question:

'UPDATE "polls_poll" SET "question"=How can we help you? WHERE "id"=1'

(from connection.queries, which is why the quoting isn't complete)

Attachments (36)

save_fields.patch (688 bytes) - added by Collin Grady <cgrady@…> 7 years ago.
4102-only-udpate-fields.patch (4.1 KB) - added by Paul Smith <paulsmith@…> 7 years ago.
Patch with unit tests and docs
4102-dirty.patch (1.7 KB) - added by Collin Grady <cgrady@…> 7 years ago.
initial shot at 'dirty' flag, seems to work so far
4102-dirty.2.patch (1.7 KB) - added by Collin Grady <cgrady@…> 7 years ago.
updated patch to only set dirty if the new value is different than current
4102-dirty.3.patch (1.8 KB) - added by Collin Grady <cgrady@…> 7 years ago.
fixed another bug - _dity check died if the field didn't exist but was being assigned to
4102-dirty.4.patch (4.9 KB) - added by Collin Grady <cgrady@…> 7 years ago.
4102-dirty.5.patch (9.3 KB) - added by Collin Grady <cgrady@…> 7 years ago.
Updated patch against [5832], fixed bug with kwargs instantiation and dirty flags
4102-dirty-set.patch (9.5 KB) - added by Collin Grady <cgrady@…> 7 years ago.
converted patch to use set on suggestion of SmileyChris, also found another bug when you used a kwargs of a field name instead of attname
4102-dirty-set.2.patch (9.4 KB) - added by Collin Grady <cgrady@…> 7 years ago.
removed keyword from __init__, added call to _wash in query.py
4102-dirty-set.3.patch (9.3 KB) - added by Collin Grady <cgrady@…> 7 years ago.
woops, removed old commented line
4102-dirty-set.4.patch (5.4 KB) - added by Collin Grady <cgrady@…> 7 years ago.
sigh, patch hates me - fixed duplicate test issue
4102-dirty-set.5.patch (5.0 KB) - added by SmileyChris 7 years ago.
Cleaner implementation: doesn't mess with init
4102-dirty-set.6.patch (5.0 KB) - added by SmileyChris 7 years ago.
small fix (a del exception is an AttributeError, not a KeyError)
4102-dirty-set.7.patch (5.8 KB) - added by Collin Grady <cgrady@…> 7 years ago.
small fix to the set creation - need to add name to the set, not the value
4102-dirty-set.8.patch (8.1 KB) - added by Collin Grady <cgrady@…> 7 years ago.
fixed another value/name mixup, optimized a few operations
4102-dirty-set.9.patch (4.9 KB) - added by Collin Grady <cgrady@…> 7 years ago.
Fixed duplicate tests due to svn glitch again
4102-timings.py (2.2 KB) - added by Collin Grady <cgrady@…> 7 years ago.
perf test script
4102-timings.2.py (3.3 KB) - added by SmileyChris 7 years ago.
new properties method, faster than noop :)
4102-timings.3.py (4.5 KB) - added by SmileyChris 7 years ago.
"hyper-optimized" method, and showing the downfall of the properties method
4102-modified.patch (5.0 KB) - added by SmileyChris 7 years ago.
Latest version of the patch, reimplemented checking against the existing value after a discussion with Collin because otherwise user input like forms/etc is going to mark everything modified regardless
4102.patch (4.8 KB) - added by Collin Grady <cgrady@…> 6 years ago.
fixed glitch with foreignkeys
4102.2.patch (6.2 KB) - added by cgrady 6 years ago.
post-qsrf patch
4102.3.patch (4.3 KB) - added by das 6 years ago.
Patch against django 1.0. Removed duplicate test.
4102.3.fix-multi-table (892 bytes) - added by clay 5 years ago.
Patches Django-1.0.2 with patch 4102.3 applied to fix multi-table inheritance
4102_django-1.1.patch (2.0 KB) - added by seandong 5 years ago.
patch for django1.1
4102-r16288.diff (5.2 KB) - added by ramiro 3 years ago.
Patch updated to r16288, code was adapted to current code status and tests were converted to unit tests.
4102_Django1.3_JL20110610.patch (6.5 KB) - added by jlorieau 3 years ago.
modified attrs save patch which deals with pre_save methods
#4102-update_modified_fields_only.diff (4.3 KB) - added by Kronuz 2 years ago.
Fixes multitable inheritance
only_fields_django-1.4.patch (4.5 KB) - added by niwi 2 years ago.
update_fields_django-1.4.patch (6.8 KB) - added by niwi 2 years ago.
update_fields_django-1.4-2.patch (9.0 KB) - added by niwi 2 years ago.
update_fields_django-1.4-3.patch (9.4 KB) - added by niwi 2 years ago.
update_fields_django-1.4-4.patch (9.9 KB) - added by niwi 2 years ago.
update_fields_django-1.5-1.patch (15.8 KB) - added by niwi 2 years ago.
update_fields_django-1.5-2.patch (16.7 KB) - added by niwi 2 years ago.
update_fields_django-1.5-3.patch (17.3 KB) - added by niwi 2 years ago.

Download all attachments as: .zip

Change History (134)

Changed 7 years ago by Collin Grady <cgrady@…>

comment:1 Changed 7 years ago by Simon G. <dev@…>

  • Needs documentation set
  • Needs tests set
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

I like this functionality for the stated performance reasons, but I can't help thinking that this could be done automatically by just looking for anything that's changed (although the tracking of before and after states may be more trouble than it's worth!).

If accepted, this would need a brief bit of documentation for database-api too (just a description and an example). Some tests would be good too.

comment:2 Changed 7 years ago by mtredinnick

Simon, tracking "before" vs. "after" states can't really be done automatically, since we would have to trap every attribute access to the class (which are, after all, just normal Python attribute accesses). Doing an extra database lookup to work out this information would also be bad.

I'm slightly in favour of this feature at the moment, but still thinking about it. I think it would need to be implemented with a manual list of the fields, as in Collin's patc if we did it.

Changed 7 years ago by Paul Smith <paulsmith@…>

Patch with unit tests and docs

comment:3 Changed 7 years ago by Paul Smith <paulsmith@…>

I've uploaded a patch based on Collin's original (using only_fields instead of save_fields as the keyword argument -- this implies a possible except_fields keyword argument, or perhaps exclude=False|True which defaults to False), including unit tests and docs.

Should an exception be thrown if invalid field names are given in the list?

comment:4 Changed 7 years ago by Collin Grady <cgrady@…>

This came up again at work and it was mentioned that you should be able to override __setattr__ for the meta class to trap attribute changes and set a dirty flag for various columns, which could enable some automation of this.

Changed 7 years ago by Collin Grady <cgrady@…>

initial shot at 'dirty' flag, seems to work so far

Changed 7 years ago by Collin Grady <cgrady@…>

updated patch to only set dirty if the new value is different than current

Changed 7 years ago by Collin Grady <cgrady@…>

fixed another bug - _dity check died if the field didn't exist but was being assigned to

Changed 7 years ago by Collin Grady <cgrady@…>

comment:5 Changed 7 years ago by Collin Grady <cgrady@…>

Found another issue in the previous patches - if you created an object, for instance Poll(id=1, question='What color is the sky?') and then tried to save it, it would not recognize that you had modified those attributes.

Added a kwarg to __init__ to decide if the dirty flags should be washed at the end of __init__, and modified the model creation in query.py to use the flag, so that queries from the database are not flagged as dirty, but manual object creation can be done either way.

Changed 7 years ago by Collin Grady <cgrady@…>

Updated patch against [5832], fixed bug with kwargs instantiation and dirty flags

Changed 7 years ago by Collin Grady <cgrady@…>

converted patch to use set on suggestion of SmileyChris, also found another bug when you used a kwargs of a field name instead of attname

Changed 7 years ago by Collin Grady <cgrady@…>

removed keyword from __init__, added call to _wash in query.py

Changed 7 years ago by Collin Grady <cgrady@…>

woops, removed old commented line

Changed 7 years ago by Collin Grady <cgrady@…>

sigh, patch hates me - fixed duplicate test issue

comment:6 Changed 7 years ago by SmileyChris

  • Needs tests unset

This is looking nice and shiny. With some docs, it'd be ready for core IMO.

Changed 7 years ago by SmileyChris

Cleaner implementation: doesn't mess with init

Changed 7 years ago by SmileyChris

small fix (a del exception is an AttributeError, not a KeyError)

comment:7 Changed 7 years ago by SmileyChris

Malcolm asked about the performance impact. It doesn't look good. Here's a quick mock test I created:

>>> from timeit import Timer

# A control model
>>> setup_a = '''
class Model(object):
    pass
m = Model()
'''

# A model which has a __setattr__ which does nothing
>>> setup_b = '''
class Model(object):
    def __setattr__(self, name, value):
        super(Model, self).__setattr__(name, value)
m = Model()
'''

# A model which has a __setattr__ which matches this patch
>>> setup_c = '''
class Model(object):
    def __setattr__(self, name, value):
        if name != '_modified_attrs' and (not hasattr(self, name) or
                                          value != getattr(self, name)):
            if hasattr(self, '_modified_attrs'):
                if value not in self._modified_attrs: 
                    self._modified_attrs.add(name)
            else:
                self._modified_attrs = set((value,))
        super(Model, self).__setattr__(name, value)
m = Model()
'''

>>> Timer('m.an_attribute="test"', setup_a).timeit()
0.17214338693884201

>>> Timer('m.an_attribute="test"', setup_b).timeit()
1.6591357281431556

>>> Timer('m.an_attribute="test"', setup_c).timeit()
2.6405875607088092

Changed 7 years ago by Collin Grady <cgrady@…>

small fix to the set creation - need to add name to the set, not the value

comment:8 Changed 7 years ago by Brian Rosner <brosner@…>

I would think by doing value comparisons on the data is a main cause for the overhead. To mark something dirty do it based on the setting of the attribute only and not based on what data is being changed. If I am having to update a model with a 3.4 MB chunk of data it will be comparing that data with the 3 MB that may already be in there. At that point I don't care about only updating the fields I change since it will take much more time than just updating it.

Changed 7 years ago by Collin Grady <cgrady@…>

fixed another value/name mixup, optimized a few operations

comment:9 Changed 7 years ago by SmileyChris

More performance checks:

# A model which has a __setattr__ which *really* matches this patch
>>> setup_c = '''
class Model(object):
    def __setattr__(self, name, value):
        if name != '_modified_attrs':
            if hasattr(self, '_modified_attrs'):
                if name not in self._modified_attrs: 
                    self._modified_attrs.add(name)
            else:
                self._modified_attrs = set((name,))
        super(Model, self).__setattr__(name, value)
m = Model()
'''

# Optimized for speed, and dropped the value check as Brian suggested
>>> setup_d = '''
class Model(object):
    def __setattr__(self, name, value):
        try:
            if name not in self._modified_attrs: 
                self._modified_attrs.add(name)
        except AttributeError:
            if name != '_modified_attrs':
                self._modified_attrs = set((name,))
        super(Model, self).__setattr__(name, value)
m = Model()
'''

>>> Timer('m.an_attribute="test"', setup_a).timeit()
0.17163466306471914
>>> Timer('m.an_attribute="test"', setup_b).timeit()
1.7086492836380041
>>> Timer('m.an_attribute="test"', setup_c).timeit()
2.3102257409810463
>>> Timer('m.an_attribute="test"', setup_d).timeit()
1.9328342009948187

So setup_d is only 13% slower than a plain __setattr__ override... that's not bad...

comment:10 Changed 7 years ago by SmileyChris

(actually, I dropped the value check in setup_c too, so it's still more like 2.71 for the speed of a __setattr__ which matches the current patch)

Changed 7 years ago by Collin Grady <cgrady@…>

Fixed duplicate tests due to svn glitch again

Changed 7 years ago by Collin Grady <cgrady@…>

perf test script

comment:11 follow-up: Changed 7 years ago by Collin Grady <cgrady@…>

Attached the earlier tests (slightly corrected) - it seems that the majority of the extra time used is merely from defining __setattr__, strangely enough...

Results of script, showing single vs double:

Single Assignment
Control:   0.29817199707
noop:      1.92452192307
patch:     3.07881999016
optimized: 2.29438686371

Double Assignment
Control:   0.298352003098
Noop:      1.92282295227
Patch:     3.08132410049
Optimized: 2.29805707932

The odd thing (to me at least) is that the set creation doesn't appear to take any more time than simply adding a value to the set (as shown by how the 2nd assignment takes just as long as the first).

If the performance difference caused by the __setattr__ stuff is too much, I think the original manual idea of being able to specify save/exclude fields in save() could be a decent compromise.

comment:12 Changed 7 years ago by anonymous

BTW the latest patch obviously is very backwards-incompatible for anyone using obj.dict.update() to update a lot of attributes at once, or using obj.dict.update(form.cleaned_data) as I've seen on a few blog posts.

Changed 7 years ago by SmileyChris

new properties method, faster than noop :)

comment:13 in reply to: ↑ 11 Changed 7 years ago by SmileyChris

Replying to Collin Grady <cgrady@the-magi.us>:

The odd thing (to me at least) is that the set creation doesn't appear to take any more time than simply adding a value to the set (as shown by how the 2nd assignment takes just as long as the first).

It's because you were doing it in set up, not in the actual test!

Results of script:

Single Assignment
Control:   0.168
noop:      1.689 (10.0x)
patch:     2.581 (15.3x)
optimized: 1.872 (11.1x)
property:  1.494 (8.9x)

Double Assignment
Control:   0.368
noop:      3.400 (9.2x)
patch:     5.222 (14.2x)
optimized: 3.887 (10.6x)
property:  2.951 (8.0x)

Changed 7 years ago by SmileyChris

"hyper-optimized" method, and showing the downfall of the properties method

comment:14 Changed 7 years ago by SmileyChris

Ok, so I've got the timing down to under 6x control.

I think it's also important to mention that the writing of an attribute isn't really where most of the time is going to be taken up anyway - it's not like this is slowing down model access or saving by any real measure.

Single Assignment
Control:   0.170
noop:      1.939 (11.4x)
patch:     2.566 (15.1x)
optimized: 1.852 (10.9x)
noop-h-op: 0.787 (4.6x)
h-optimiz: 0.988 (5.8x)
property:  1.473 (8.6x)
new-attr:  1.779 (10.4x)

Retreiving
Control:   0.139
noop:      0.137 (1.0x)
patch:     0.177 (1.3x)
optimized: 0.162 (1.2x)
noop-h-op: 0.136 (1.0x)
h-optimiz: 0.135 (1.0x)
property:  0.787 (5.7x)
new-attr:  0.992 (7.2x)

Changed 7 years ago by SmileyChris

Latest version of the patch, reimplemented checking against the existing value after a discussion with Collin because otherwise user input like forms/etc is going to mark everything modified regardless

comment:15 follow-up: Changed 7 years ago by andraz@…

Two things:

  1. while you are overloading setattr, you could also add an option to allow only defined fields to be set and rising exception on setting nonexistent field.

This is a great debugging mechanism to weed out incidental typos which usually lead to quiet dataloss until you notice it at some time much too late.

  1. Are you really sure you should optimize for form-update when saving on-change - and thus needing a compare? I don't really believe that is the case which people need this db performance advantage for... Usually save() gets performance-problematic when you want to change your data in bulk... and you just need to update a counter or a date field for thousands of records, but don't want to touch all the textfields

comment:16 in reply to: ↑ 15 Changed 7 years ago by SmileyChris

Replying to andraz@zemanta.com:

Two things:

  1. while you are overloading setattr, you could also add an option to allow only defined fields to be set and rising exception on setting nonexistent field.

That is well outside of the scope of this ticket.

  1. Are you really sure you should optimize for form-update when saving on-change - and thus needing a compare?

You're not being very clear, but I think you are asking why I re-implemented the change to check against existing values.

The point in that change isn't optimization but to help with the minor threading issues. And I think you may have overlooked the fact that it only does this comparison for fields when they are being set to ensure they really are "changing".

comment:17 Changed 6 years ago by Yopi <yannvr@…>

  • Cc yannvr@… added

comment:18 Changed 6 years ago by Simon Litchfield <simon@…>

  • Cc simon@… added

Why not just set self.db_values = db_values under save() and get() etc? Then compare current values to db_values during the loop that compiles the INSERT/UPDATEs?

comment:19 Changed 6 years ago by Collin Grady <cgrady@…>

It would likely be less efficient to store an entire second copy of all model data when you can simply store a modified flag. Especially if you had a large number of fields or fields with large amounts of data in them.

comment:20 Changed 6 years ago by Simon Litchfield <simon@…>

Sorry just read Brian's point, re the big blob data scenario. Agreed setattr overloading does sound better in that light, +1 :-)

Another point. Rather than suggest it should be kept _private, why not make it clearly available and show in the docs. It will be very useful for deciding when to update related tables or not, during save() overrides etc.

See this is ugly --

def save(self):
  if not self.id:
    retag = True
  else:
    # Unfortunately, we need to hit the db to see if things have changed
    check = Tag.objects.get(pk=self.id)
    retag = (check.name != self.name)
  super(Tag, self).save()
  if retag:
    self.retag_articles()

Be great to be able to do something like this --

def save(self):
  retag = ('name' in self.modified_fields)
  super(Tag, self).save()
  if retag:
    self.retag_articles()

comment:21 Changed 6 years ago by Collin Grady <cgrady@…>

I don't see how anything stops you from reading it like that already with this patch, it isn't a __ var, and you would then be internal to the model anyway, so it seems fine to me to read from it as is :)

comment:22 Changed 6 years ago by Andrew Badr <andrewbadr.etc@…>

  • Cc andrewbadr.etc@… added

comment:23 Changed 6 years ago by SmileyChris

  • Patch needs improvement set

Need to write a new patch

(22:09:40) Magus-: SmileyChris: oh, btw, your most recent patch for 4102 is busted, I forgot to talk to you about it earlier :)
(22:09:50) SmileyChris: oh noes!
(22:09:56) SmileyChris: Magus-: what did I bust?
(22:09:59) Magus-: you can't use self.__dict__ like that
(22:10:03) Magus-: it breaks on fkeys and other things
(22:10:09) Magus-: well, only fkeys so far

comment:24 Changed 6 years ago by (removed)

  • Cc ferringb@… added

comment:25 Changed 6 years ago by dcramer

  • Cc dcramer@… added

Changed 6 years ago by Collin Grady <cgrady@…>

fixed glitch with foreignkeys

comment:26 Changed 6 years ago by PhiR

  • Cc philippe.r@… added

comment:27 Changed 6 years ago by guettli

  • Cc hv@… added

comment:28 Changed 6 years ago by cgrady

  • Patch needs improvement unset

Changed 6 years ago by cgrady

post-qsrf patch

comment:29 Changed 6 years ago by Simon Willison

You've obviously spent a lot of time investigating the performance overhead incurred by over-riding setattr, but have you done any benchmarking of the performance improvement gained by performing significantly smaller UPDATE statements? It may well be that the improvement from the more efficient UPDATEs makes the performance overhead of setattr irrelevant.

comment:30 Changed 6 years ago by jashugan

  • Cc jashugan@… added

comment:31 Changed 6 years ago by anonymous

  • Cc gav@… added

comment:32 Changed 6 years ago by cgrady

  • Owner changed from nobody to cgrady

comment:33 Changed 6 years ago by Richard Davies <richard.davies@…>

  • Cc richard.davies@… added

comment:34 Changed 6 years ago by (removed)

  • Cc ferringb@… removed

comment:35 Changed 6 years ago by das

  • Cc das@… added

Changed 6 years ago by das

Patch against django 1.0. Removed duplicate test.

comment:36 Changed 6 years ago by puissdir

I've a table where some columns are read-only (have SELECT, but no UPDATE privilege on some columns).

Before that patch I was unable to modify an object and save() it, as save() tried to write all the fields, thus triggering an exception.

Now I can, thanks !!! (at the cost of enumerating in each save() all the fields I can/want update).

Maybe this 'read-only' attribute, per column basis, could be specified in the meta class.

Regards

comment:37 Changed 5 years ago by anonymous

  • Cc omat@… added

comment:38 Changed 5 years ago by anonymous

  • Cc hwaara@… added

comment:39 Changed 5 years ago by clay

  • Patch needs improvement set

Patch 4102.3 does not work correctly with multi-table inheritance -- child._modified_attrs is not visible by the parent, so a modified parent field is not updated. Something like this fixes it:

Index: third-party/Django-1.0.2-final+4102.3+2705/django/db/models/base.py
===================================================================
--- third-party/Django-1.0.2-final+4102.3+2705/django/db/models/base.py (revision 3362)
+++ third-party/Django-1.0.2-final+4102.3+2705/django/db/models/base.py (working copy)
@@ -358,7 +358,10 @@

non_pks = [f for f in meta.local_fields if not f.primary_key]
modified_attrs = self._modified_attrs
non_pks = [f for f in non_pks if (f.name in modified_attrs or f.attname in modified_attrs)]

  • self._reset_modified_attrs()

+
+ # Reset modified attribute accumulator after parent fields have been updated
+ if cls == self.class:
+ self._reset_modified_attrs()

# First, try an UPDATE. If that doesn't update anything, do an INSERT.
pk_val = self._get_pk_val(meta)

Changed 5 years ago by clay

Patches Django-1.0.2 with patch 4102.3 applied to fix multi-table inheritance

comment:40 Changed 5 years ago by clay

Apologies for the formatting snafu above. I've uploaded the patch as attachment 4102.3.fix-multi-table.

comment:41 Changed 5 years ago by clay

  • Cc clay@… added

comment:42 Changed 5 years ago by anonymous

  • Cc andrehcampos@… added

Changed 5 years ago by seandong

patch for django1.1

comment:43 Changed 5 years ago by seandong

  • Has patch unset
  • milestone set to 1.1

If query at the same time use function 'select_related' and 'only', it will be wrong.

s = Station_Info.objects.select_related('player_info').only('name','player_infonickname').get(pk=1)
s.name='test11'
s.save()

Traceback (most recent call last):

File "<console>", line 1, in <module>
File "/usr/lib/python2.6/dist-packages/django/db/models/base.py", line 420, in save

self.save_base(force_insert=force_insert, force_update=force_update)

File "/usr/lib/python2.6/dist-packages/django/db/models/base.py", line 462, in save_base

self.save_base(cls=parent, origin=org)

File "/usr/lib/python2.6/dist-packages/django/db/models/base.py", line 487, in save_base

rows = manager.filter(pk=pk_val)._update(values)

File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 449, in _update

return query.execute_sql(None)

File "/usr/lib/python2.6/dist-packages/django/db/models/sql/subqueries.py", line 120, in execute_sql

cursor = super(UpdateQuery, self).execute_sql(result_type)

File "/usr/lib/python2.6/dist-packages/django/db/models/sql/query.py", line 2385, in execute_sql

cursor.execute(sql, params)

File "/usr/lib/python2.6/dist-packages/django/db/backends/util.py", line 19, in execute

return self.cursor.execute(sql, params)

File "/usr/lib/python2.6/dist-packages/django/db/backends/mysql/base.py", line 89, in execute

raise Database.IntegrityError(tuple(e))

IntegrityError: (1048, "Column 'map_info_id' cannot be null")

comment:44 Changed 5 years ago by russellm

  • Has patch set
  • milestone 1.1 deleted

Please don't mark tickets against the v1.1 milestone unless they are critical failures. New features are _definitely_ not for the milestone.

comment:45 Changed 4 years ago by miracle2k

  • Cc miracle2k added

comment:46 Changed 4 years ago by russellm

#12172 raised a side effect that should be a logical consequence of implementing this feature.

comment:47 Changed 4 years ago by graham_king

  • Cc graham@… added

comment:48 Changed 4 years ago by nmk

  • Cc nikitka@… added

comment:49 Changed 4 years ago by dulepov

  • Cc wwbj2r68n2@… added

comment:50 Changed 4 years ago by dulepov

  • Cc wwbj2r68n2@… added; wwbj2r68n2@… removed

comment:51 Changed 4 years ago by das

  • Cc das@… removed

comment:52 Changed 4 years ago by alexkoshelev

  • Cc alexkoshelev added

comment:53 Changed 4 years ago by Alex

  • Triage Stage changed from Design decision needed to Accepted

comment:54 Changed 4 years ago by bberes

  • Cc bberes added

comment:55 Changed 4 years ago by cgrebs

  • Cc cg@… added

comment:56 Changed 3 years ago by simon29

  • Cc simon@… added

comment:57 Changed 3 years ago by maxi

  • Cc maxirobaina@… added

comment:58 Changed 3 years ago by lrekucki

  • Severity set to Normal
  • Type set to New feature

Changed 3 years ago by ramiro

Patch updated to r16288, code was adapted to current code status and tests were converted to unit tests.

comment:59 Changed 3 years ago by ramiro

  • Easy pickings unset
  • Keywords update fields sql row table modified added
  • Patch needs improvement unset

Changed 3 years ago by jlorieau

modified attrs save patch which deals with pre_save methods

comment:60 follow-up: Changed 3 years ago by jlorieau

  • UI/UX unset

There are a few bugs in this feature that are fixed with my patch (4102_Django1.3_JL20110610.path).

Field pre_save methods, like the DateTime auto_now, can change the value of a field just before a save, and these changes are not included in the modified_attrs list. I've made modifications to account for changes in field values by pre_save, and I've included a test to account for this change.

I've made a few other changes, which improve and fix bugs in this new feature:

  1. In the __setattr__ method (base.py, line 370), I retrieve the _modified_attrs list using a setdefault method on the object __dict__. This is needed for situations when a model's __init__ has been overloaded and attribute access occurs before the parent __init__ is called.
  1. I've moved the _reset_modified_attrs call in the save_base to line 580 (base.py, line 580). Some Field pre_save methods, including some that I've developed, need access to the _modified_attrs list before it is cleared.

In addition, I have a few suggestions that I believe would enhance this feature.

I propose that a 'save_all' flag be included in the save argument list. It could be used to revert to the original behavior to save all fields at once. Furthermore, a setting in the settings.py could change the default behavior of the model to save only modified fields or all.

The new feature could be extended with a new signal named TrackModification or TrackChange. On object creation, modification, or deletion, a signal would be emitted with a list of the modified attributes--these would, of course, have to include fields modified by pre_save. Alternatively, this change could be integrated into the post_save signal. I propose this as a feature because it would be relatively easy to implement, and very useful to have.

comment:61 in reply to: ↑ 60 Changed 3 years ago by jlorieau

Replying to jlorieau:

The new feature could be extended with a new signal named TrackModification or TrackChange. On object creation, modification, or deletion

Just to clarify on this point, I am proposing to emit a signal only when save (and possibly __init__ and delete) is called, and not on every invocation of __setattr__. Basically, we could include the _modified_attrs information to existing signals (post_save and post_init) for free, making them much more powerful--for example, in invalidating a cached attribute only when that attribute actually changes.

This would be a new ticket that builds on this feature, if this feature is approved.

comment:62 Changed 3 years ago by kmike

  • Cc kmike84@… added

comment:63 Changed 3 years ago by sfllaw

  • Cc sfllaw@… added

comment:64 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added
  • Patch needs improvement set

Some quick notes:

  • I don't think this works when using deferred field loading
  • The _modified_attrs should go to the model._state, not directly to the model
  • The _state.adding should maybe play a part in this. What about: if we are adding, then every field has changed by default? Maybe this is not so hot idea, but could be worth exploring.
  • There is a trick one can use to speed up model __init__:
    # Pseudocode
    
    class Model():
        def __setattr__(args):
            ...
        _my_setattr = __setattr__
        
        def __init__(self, ...):
            # Check if inheriting class has defined a __setattr__ method
            if self._my_setattr == self.__setattr__:
                # If not, we can skip our own __setattr__ and use
                # directly the super __setattr__
                _set = super(Model, self).__setattr__
            else:
                # We need to call the __setattr__ defined in inheriting class
                # (backwards compatibility).
                _set = self.__setattr__
            ...
            for field, val in izip(fields, args):
                _set(self, field.attname, val)
            ...
               
    

This way if there is no overriding __setattr__ there is almost no speed loss in Model's __init__. On the other hand this is a dirty trick that could easily bite...

Also a check could be done to guard against overriding __setattr__ doing direct __dict__ manipulation and not calling our __setattr__. On Django startup create a instance of every Model class and see if a modification registers to _modified_attrs. If not, raise an Exception. Python documentation says that __setattr__ should call the base __setattr__ for new style classes, but it is a safe bet that there are users who do not do that.

comment:65 Changed 3 years ago by akaariai

Correcting myself: Actually, I do think this works with deferred attributes, the deferred loading of attributes accesses directly the __dict__ of the instance, and if the attribute has not been accessed, save() will not access it. This is actually a big plus to this feature: if one uses .only() to load just the attributes needed in data manipulation and then saves the object back, things work as expected. Currently all the deferred attributes will be fetched before save, one at a time if I am not mistaken...

On the other hand, there is a minor backwards incompatibility. The following is valid at the moment (although not necessarily very wise thing to do):

# Lets fetch an object and update another object with it's data. 
m = M.objects.get(pk=1)
m.pk = 2 # assume pk = 2 is in the DB.
m.save() # Old code would have updated all attributes, new code doesn't.

This could be fixed by setting all attrs as changed when the pk has changed.

comment:66 Changed 2 years ago by dtran

  • Cc dtran added

comment:67 Changed 2 years ago by Kronuz

  • Cc Kronuz added

With the current patch in multitable inheritance, when you modify an already existing object and save it, it only saves data to the parent. This happens because _reset_modified_attrs() is called during the topmost first parent's save_base() call and further updates are ignored since self._modified_attrs no longer have anything after that. I'm attaching a new patch to fix this.

Last edited 2 years ago by Kronuz (previous) (diff)

Changed 2 years ago by Kronuz

Fixes multitable inheritance

comment:68 Changed 2 years ago by akaariai

The patch has a fundamental problem. Some field types are mutable in-place. This means that the check in __setattr__ will never see a change if the value is mutated in-place, and hence this approach does not work for in-place mutable fields. See this post for some discussion related to this problem & some ideas how to solve the problem.

comment:69 Changed 2 years ago by mattlong

After reviewing the history of this ticket, it seems what started as a simple opt-in feature request of adding a field white-list to the Model's save function morphed into a complicated dirty flag approach that obviously has many edge cases and performance implications given that this ticket has been open for 5 years now. Just compare the very first proposed patch to the latest to see what I mean.

Unfortunately, this seemed to have happened without much discussion about the relative merits of each approach. I would prefer the Django ORM leave it to me to decide which fields I would like to update rather than relying on it to correctly infer which fields I would like updated. Clearly some people feel differently and favor the dirty flag approach for a more hands-off approach. As such, I propose adding support for both methods so that developers can choose the right approach for their use case.

Last edited 2 years ago by mattlong (previous) (diff)

comment:70 Changed 2 years ago by mattlong

  • Cc mattlong added

comment:71 Changed 2 years ago by akaariai

Implementing only_fields kwarg for model .save() sounds like a plan. That would be really useful feature, simple to implement and I think nobody will give a -1 to that idea.

In ticket #17332 is a patch which should allow (with minor changes) override of model._state. Also, the ModelState does have some methods in it which allow tracking of dirty field state (of course, implemented in a custom state class). That way 3rd party projects could pretty easily implement tracking of dirty field state in the way they wish (hashing, flags, storing the old values...). Using the ._state information you can then implement custom .save() which automatically saves only changed fields. Note that the important parts of the patch are the ModelState changes, and where .update()/clear() are called, rest is irrelevant to this ticket.

It seems clear that a default behavior of saving only changed fields will not get into Django core.

comment:72 Changed 2 years ago by bitwolaiye

  • Cc bitwolaiye added

when I use #4102-update_modified_fields_only.diff I meet a question that item.save() will update all field in the first time, but update specific changed field in the next.
so I change "self._reset_modified_attrs()" in the end of init not in the top of init. because in init setattr will be called, and problem is come.

comment:73 Changed 2 years ago by inactivist

  • Cc inactivist@… added

+1 mattlong's comment 69.

Last edited 2 years ago by inactivist (previous) (diff)

comment:74 Changed 2 years ago by niwi

I have been researching the same, and I agree with akaariai: implementing only_fields kwarg for model .save() sounds like a plan. Furthermore, it is simpler and more explicit.

Attached the patch based on the original and modified for the current version.

Changed 2 years ago by niwi

comment:75 Changed 2 years ago by akaariai

  • Needs tests set

A couple of design considerations:

  • if only fields is given should it imply force_update=True. My take is yes, as I don't think there will be a use case for only_fields + insert. Set the field values to NULL instead. In addition only_fields + insert could lead to some complicated situations. So, maybe the kwarg should instead be update_fields?
  • what if the model does not have fields given in the update_fields argument? Should it be an error or not. I guess erroring out could be wise, as otherwise this could hide data loss bugs. This should probably be done in .save() as a pre-check, and then .save_base just uses the given set without further checks.
  • It might make sense to add the field set to pre/post save signals as an argument if this is doable from backwards compatibility viewpoint.
  • Should a deferred model which is then saved have automatically update_fields set so that only the non-deferred fields will be saved. I think currently the fields are loaded from the database and then the update will update the fields to the just fetched values which does not make sense.

This should be tested with model inheritance, too. It is a common pain-point for this kind of feature. The current test doesn't actually seem to test that the name is updated, so in general more tests needed.

comment:76 Changed 2 years ago by niwi

  • Cc niwi@… added

Interesting considerations. I will perform more tests with your considerations in mind.

comment:77 Changed 2 years ago by niwi

Here the new patch with more tests and some considerations implemented.

Changed 2 years ago by niwi

comment:78 Changed 2 years ago by akaariai

Looks good (although I haven't done a full review). Skipping deferred model handling is OK, it does actually contain some possible issues (load deferred model, set attributes from a form, save() must work). I think this could be dealt with, but lets just leave it out for now.

The pre/post save signal should get a new argument. There is some precedence to adding arguments to signals ("using" in 1.3 for example) so this should not be impossible to do from backwards compatibility point of view. The .save() is also documented to take kwargs, so changing save's signature should be OK, too.

comment:79 Changed 2 years ago by niwi

Here the last patch, with the parameters for signals.

Changed 2 years ago by niwi

comment:80 follow-up: Changed 2 years ago by akaariai

The patch looks good to me, I can't spot any serious failures there. Some corner cases below. I would value other opinions on the issues below.

The patch still lacks the update_fields implies force_update=True part. Does somebody know of a valid use case for "only fields" behavior for inserts? My main concern here is that this breaks default values. Another option (as is done in the patch) is to not imply force_update, but do full save if the save leads to insert, so that the update_fields is in effect only for updates. The documentation is currently misleading about the behavior.

My take on this is to set force_update if update_fields is not None. There are two main use cases for this feature: performance (in which case you do want force_update=True), and "update only these fields, leave others as they are in DB". In the latter case doing an insert would be an error. On the other hand requiring the user to set the force_update=True is risk free from Django's perspective.

Should datetime fields with auto_now=True be always included in the fields list automatically?

If the update_fields is supplied but is empty shouldn't the save be skipped completely in that case (or if it contains only the PK field)?

And still one more thing: the update_fields is passed to the signal handler but its type is not known. If it happens to be a list (or any other mutable container), the signal handler can modify it in place. There is no safety checks for modifications done there. I am not sure if signal handlers being able to modify the list is useful or not, but at least the passed in type should be consistent: either a list or a tuple (actually, set or immutable set could make more sense).

EDIT: Just noted that there is .delete() calls in the end of the test methods: no need for that, Django's TestCase will do the cleanup for you.

Last edited 2 years ago by akaariai (previous) (diff)

comment:81 Changed 2 years ago by clay

  • Cc clay@… removed

comment:82 in reply to: ↑ 80 ; follow-up: Changed 2 years ago by niwi

Replying to akaariai:

The patch looks good to me, I can't spot any serious failures there. Some corner cases below. I would value other opinions on the issues below.

The patch still lacks the update_fields implies force_update=True part. Does somebody know of a valid use case for "only fields" behavior for inserts? My main concern here is that this breaks default values. Another option (as is done in the patch) is to not imply force_update, but do full save if the save leads to insert, so that the update_fields is in effect only for updates. The documentation is currently misleading about the behavior.


I've been thinking about this and I think you're right! update_fields have to implies force_update=True. Although, I agree considerations!


My take on this is to set force_update if update_fields is not None. There are two main use cases for this feature: performance (in which case you do want force_update=True), and "update only these fields, leave others as they are in DB". In the latter case doing an insert would be an error. On the other hand requiring the user to set the force_update=True is risk free from Django's perspective.

Should datetime fields with auto_now=True be always included in the fields list automatically?


At this point, I think we should not automatically preselected fields "DateTimeField" with "auto_now = True". This option is for advanced use, and in this case one must know what is really saving.


If the update_fields is supplied but is empty shouldn't the save be skipped completely in that case (or if it contains only the PK field)?


If the update_fields is supplied but is empty, now, will have the same behavior when update_fields is not supplied. In my opinion the behavior is correct.


And still one more thing: the update_fields is passed to the signal handler but its type is not known. If it happens to be a list (or any other mutable container), the signal handler can modify it in place. There is no safety checks for modifications done there. I am not sure if signal handlers being able to modify the list is useful or not, but at least the passed in type should be consistent: either a list or a tuple (actually, set or immutable set could make more sense).


I think it should be a mutable object. Thus, if someone needs it, will have the ability to automatically add to the list datetime fields with auto_now = True.

I do not understand what you mean by specifying the type. If the parameter is None, the signal receives None. If the parameter list/tuple, the signal receives list/tuple.

Are you thinking about some kind of type checking?


EDIT: Just noted that there is .delete() calls in the end of the test methods: no need for that, Django's TestCase will do the cleanup for you.


I updated the documentation, but my English is not the best in the world. If you can improve it, I would be very grateful.

Changed 2 years ago by niwi

comment:83 in reply to: ↑ 82 Changed 2 years ago by akaariai

  • Needs documentation unset
  • Needs tests unset

Replying to niwi:

At this point, I think we should not automatically preselected fields DateTimeField with "auto_now = True". This option is for advanced use, and in this case one must know what is really saving.

Agreed. This is like qs.update() - no addition of auto_now fields there.

If the update_fields is supplied but is empty, now, will have the same behavior when update_fields is not supplied. In my opinion the behavior is correct.

I am not sure about this one. Main concern here is that removing a field from update_fields can lead to implicit addition of all the model's fields to the update_fields which feels counter-intuitive.

I think it should be a mutable object. Thus, if someone needs it, will have the ability to automatically add to the list datetime fields with auto_now = True.

I do not understand what you mean by specifying the type. If the parameter is None, the signal receives None. If the parameter list/tuple, the signal receives list/tuple.

Are you thinking about some kind of type checking?

The problem is if the user gives update_fields as a tuple then the signal handler can not mutate it in place - if it is a set you can mutate it with add - if it is a list then you can append to it. So, I am thinking of converting the update_fields to a common type for the signal handler. However not converting can be seen as a feature, too.

I updated the documentation, but my English is not the best in the world. If you can improve it, I would be very grateful.

I will see what I can do. The most important thing right now is having somewhat correct information in correct places, not grammatical accuracy.

If I am not mistaken there are two open items:

  1. Should empty, but not None update_fields argument skip save completely instead of doing a normal save. My feeling is skip the save here.
  2. Should signal handlers be able to mutate the update_fields? It is possible to leave this up to the caller of .save() - if a list was given as update_fields, then it is mutable in the signal handler - if a tuple was given then it isn't. Maybe leaving this as is (but checking that isinstance(update_fields, (list, tuple))) is a good solution here.

I am leaving the "patch needs improvement" checked on grounds of not actually testing the patch...

comment:84 Changed 2 years ago by niwi

I implemented the last considerations. And I added some more tests.

Thanks for your attention. ;)

Changed 2 years ago by niwi

comment:85 Changed 2 years ago by niwi

Here is the latest patch, reviewed by akaariai in the pull request (https://github.com/django/django/pull/4)

Changed 2 years ago by niwi

comment:86 Changed 2 years ago by niwi

Current pull request in one commit https://github.com/django/django/pull/33

comment:87 follow-up: Changed 2 years ago by aaugustin

The code looks generally sane to me.

A few remarks:

  • Does the code behave properly with related models -- ie. you can save(update_fields=['related']) rather than save(update_fields=['related_id'])? I haven't seen a test for this.
  • There are tests for inherited models, which is a good idea. I'm not sufficiently familiar with the implementation of the ORM to tell if other features (like proxy models) are worth testing. If you think they are, a few more tests would be nice.
  • This isn't a minor feature, it deserves a full paragraph in the release notes.
  • tests/modeltests/update_only_fields/__init__.py should be totally empty.

comment:88 in reply to: ↑ 87 Changed 2 years ago by niwi

Replying to aaugustin:

The code looks generally sane to me.

A few remarks:

  • Does the code behave properly with related models -- ie. you can save(update_fields=['related']) rather than save(update_fields=['related_id'])? I haven't seen a test for this.

I will check this part and add more tests if necessary.

  • There are tests for inherited models, which is a good idea. I'm not sufficiently familiar with the implementation of the ORM to tell if other features (like proxy models) are worth testing. If you think they are, a few more tests would be nice.

In my opinion, this does not affect these things. Only does the filter fields that enter the UPDATE statement. The rest of the default behavior is as if suddenly the model happens to have less fields than actually have.
In any case, I see to do more tests to be sure.

  • This isn't a minor feature, it deserves a full paragraph in the release notes.

This part of me is complicated for me. My English is not very good.

  • tests/modeltests/update_only_fields/__init__.py should be totally empty.

;)

comment:89 Changed 2 years ago by niwi

Changes:

  • add tests for proxied models.
  • remove all content from tests/modeltests/update_only_fields/init.py

I confirm, testcases covers the first point (save related objects) and works correctly.

Changed 2 years ago by niwi

comment:90 Changed 2 years ago by akaariai

I just spotted that m2m fields are allowed in the "update_fields" arg. I guess they have to be forbidden, as this could lead to confusion: "I defined 'm2m' in the update_fields, but it did not get updated. Whats going on?". Maybe removing the PK field should be done, too. You can't actually update the primary key currently, and allowing it in the update_fields is misleading.

Maybe the right fix is to use something like [field.name for field in self._meta.fields if not field.primary_key] instead of get_all_field_names(). Not tested...

I can take care of the above points when committing the patch. Of course, if you want to do the work I am more than happy to let you do it... :)

comment:91 Changed 2 years ago by niwi

I tried your suggestion and give the desired results. Now the fields are no longer allowed m2n. I added the corresponding test.

Regarding github and pull-requests ... always need to be all in a single commit? actually I'm a bit confused, and I hate to be opening pull-requests different each time you make a change or commit.

With this change I created a new branch, with the 3 commits I've made for this issue. And upload the patch here.

Unfortunately, documentation / paragraph in the release notes, I'll leave to you. I do not feel good ability to write this text.

Changed 2 years ago by niwi

comment:92 Changed 2 years ago by akaariai

I will take this ticket from here.

We don't have any official guidelines on how to work on github, but here is an approach I like.

# Assume you have django/django as "upstream" remote in your local git repo.
# (if not: git add remote upstream git@github.com:django/django.git)

# Start work on ticket.
# Work in ticket_4102, make it track upstream/master branch.
git checkout -b ticket_4102 upstream/master
# do some commits:
git add files
git commit -m 'A commit message...'
# check for whitespace errors
git diff --check upstream/master

# Spot some errors, correct files and commit
git add files
git commit -m 'DOH! whitespace error!'

# The above commit should not go into Django's repository, so squash it
# (HEAD~2 refers to last two commits on this branch)
git rebase -i HEAD~2
# pick the original commit ("A commit message..."), squash (s as first char on line) the "DOH!" commit.
# reword commit messages, usually keep the first message as is, drop the second message.
# Run (relevant) tests!
# ok, seems the work is ready, push it to your github repo (assumed to be origin here)
git push origin ticket_4102

# make a pull request in github.
# An annoying core-developer spots a world-ending issue (another whitespace error).
# And, there have been some commits to the django/django repo too, so you need to update your patch to head.
# In your local branch ticket_4102 rebase your work to current upstream master
git fetch upstream
git rebase
# If conflicts above, correct & continue.
# correct & commit the whitespace error, repeat the rebase -i step above
# Run (relevant) tests!

# Now, push again to origin. You have rebased your work. This changes the commit
# history, and those who based their work on your work will hate you (they need to
# rebase their work, too). However, this does not matter, as your branch is a
# "topic branch" and nobody should trust your branch to be stable in commit history.
# So force push changes:
git push --force origin ticket_4102

# You have your work in a single corrected commit in the original pull request! 

The above isn't tested.

Commit message guidelines (in short): A summary line of 50 chars. Separated with an empty line, paragraphs of 70 chars per line detailing the commit. Limits are soft, so if need be, break the characters per line rules.

A single commit always? No, separate your work into logical "blocks". For this ticket, there is just one logical block, so a single commit.

If the above seems confusing or a lot of work to do, here is another approach: At some point hand over your work to a committer who will finish it off.

As said, the above isn't official Django workflow, just my approach...

comment:93 Changed 2 years ago by niwi

Thank you very much! It has helped me a lot to understand how git rebase works.

comment:94 follow-up: Changed 2 years ago by akaariai

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

Pull 41 contains an RFC patch - I will wait for a day or so for final comments on the patch.

comment:95 in reply to: ↑ 94 Changed 2 years ago by russellm

Replying to akaariai:

Pull 41 contains an RFC patch - I will wait for a day or so for final comments on the patch.

I've just given the code a quick look over; other than a couple of minor typos (comments on the diff), it looks good to me. Congratulations to all involved in killing another 5 year old Django dragon!

comment:96 Changed 2 years ago by Andrei Antoukh

  • Resolution set to fixed
  • Status changed from new to closed

Fixed #4102 -- Allow update of specific fields in model.save()

Added the ability to update only part of the model's fields in
model.save() by introducing a new kwarg "update_fields". Thanks
to all the numerous reviewers and commenters in the ticket

Changeset: 365853da016f242937a657b488514e2f69fa6d82

comment:97 Changed 8 months ago by anonymous

Any interest in addressing the original interest of this ticket again? django-save-the-changes is one possible solution. I also opened a ticket on that project about the feasibility on porting it to core Django.

comment:98 Changed 8 months ago by russellm

If, by referring to the "original" interest of the ticket, you're referring to automatically tracking the 'modified' fields -- then the answer is probably no. If you read the ticket history, you'll see why we didn't implement this approach, so unless you've got a new approach to the implementation, there isn't much to discuss.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.