Django

Code

Ticket #4102 (new)

Opened 1 year ago

Last modified 2 weeks ago

Allow UPDATE of only specific fields in model.save()

Reported by: Collin Grady <cgrady@the-magi.us> Assigned to: nobody
Milestone: Component: Database wrapper
Version: SVN Keywords:
Cc: yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com, dcramer@gmail.com, philippe.r@oult.eu, hv@tbz-pariv.de, jashugan@gmail.com Triage Stage: Design decision needed
Has patch: 1 Needs documentation: 1
Needs tests: 0 Patch needs improvement: 0

Description

At present, django updates every single field when saving - this can be less than desirable performance-wise if you only need to update 1 or 2, and can help avoid some minor threading issues. (two functions that never edit the same columns can never conflict with each other using this)

This patch allows you to specify which columns should be saved, and the rest are left as they currently are in the database.

Example:

>>> p = Poll.objects.get(pk=1)
>>> p.question = "How can we help you?"
>>> p.save(['question']) # or p.save(save_fields=['question'])

The SQL generated is then only touching the question:

'UPDATE "polls_poll" SET "question"=How can we help you? WHERE "id"=1'

(from connection.queries, which is why the quoting isn't complete)

Attachments

save_fields.patch (0.7 kB) - added by Collin Grady <cgrady@the-magi.us> on 04/20/07 20:57:57.
4102-only-udpate-fields.patch (4.1 kB) - added by Paul Smith <paulsmith@pobox.com> on 05/06/07 20:31:59.
Patch with unit tests and docs
4102-dirty.patch (1.7 kB) - added by Collin Grady <cgrady@the-magi.us> on 06/21/07 23:42:47.
initial shot at 'dirty' flag, seems to work so far
4102-dirty.2.patch (1.7 kB) - added by Collin Grady <cgrady@the-magi.us> on 06/22/07 19:05:52.
updated patch to only set dirty if the new value is different than current
4102-dirty.3.patch (1.8 kB) - added by Collin Grady <cgrady@the-magi.us> on 06/22/07 23:05:11.
fixed another bug - _dity check died if the field didn't exist but was being assigned to
4102-dirty.4.patch (4.9 kB) - added by Collin Grady <cgrady@the-magi.us> on 06/29/07 04:14:56.
4102-dirty.5.patch (9.3 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/08/07 23:54:12.
Updated patch against [5832], fixed bug with kwargs instantiation and dirty flags
4102-dirty-set.patch (9.5 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/09/07 00:24:33.
converted patch to use set on suggestion of SmileyChris?, also found another bug when you used a kwargs of a field name instead of attname
4102-dirty-set.2.patch (9.4 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/09/07 17:03:07.
removed keyword from __init__, added call to _wash in query.py
4102-dirty-set.3.patch (9.3 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/09/07 17:03:55.
woops, removed old commented line
4102-dirty-set.4.patch (5.4 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/09/07 17:22:37.
sigh, patch hates me - fixed duplicate test issue
4102-dirty-set.5.patch (5.0 kB) - added by SmileyChris on 08/16/07 16:46:26.
Cleaner implementation: doesn't mess with init
4102-dirty-set.6.patch (5.0 kB) - added by SmileyChris on 08/16/07 17:08:44.
small fix (a del exception is an AttributeError, not a KeyError)
4102-dirty-set.7.patch (5.8 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/21/07 23:24:47.
small fix to the set creation - need to add name to the set, not the value
4102-dirty-set.8.patch (8.1 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/22/07 03:40:45.
fixed another value/name mixup, optimized a few operations
4102-dirty-set.9.patch (4.9 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/23/07 16:07:01.
Fixed duplicate tests due to svn glitch again
4102-timings.py (2.2 kB) - added by Collin Grady <cgrady@the-magi.us> on 08/23/07 16:08:09.
perf test script
4102-timings.2.py (3.3 kB) - added by SmileyChris on 10/01/07 23:30:03.
new properties method, faster than noop :)
4102-timings.3.py (4.5 kB) - added by SmileyChris on 10/02/07 21:45:56.
"hyper-optimized" method, and showing the downfall of the properties method
4102-modified.patch (5.0 kB) - added by SmileyChris on 10/02/07 22:56:08.
Latest version of the patch, reimplemented checking against the existing value after a discussion with Collin because otherwise user input like forms/etc is going to mark everything modified regardless
4102.patch (4.8 kB) - added by Collin Grady <cgrady@the-magi.us> on 02/15/08 00:10:22.
fixed glitch with foreignkeys
4102.2.patch (6.2 kB) - added by cgrady on 05/20/08 17:41:53.
post-qsrf patch

Change History

04/20/07 20:57:57 changed by Collin Grady <cgrady@the-magi.us>

  • attachment save_fields.patch added.

04/21/07 07:43:34 changed by Simon G. <dev@simon.net.nz>

  • needs_better_patch changed.
  • stage changed from Unreviewed to Design decision needed.
  • needs_tests set to 1.
  • needs_docs set to 1.

I like this functionality for the stated performance reasons, but I can't help thinking that this could be done automatically by just looking for anything that's changed (although the tracking of before and after states may be more trouble than it's worth!).

If accepted, this would need a brief bit of documentation for database-api too (just a description and an example). Some tests would be good too.

04/21/07 07:52:46 changed by mtredinnick

Simon, tracking "before" vs. "after" states can't really be done automatically, since we would have to trap every attribute access to the class (which are, after all, just normal Python attribute accesses). Doing an extra database lookup to work out this information would also be bad.

I'm slightly in favour of this feature at the moment, but still thinking about it. I think it would need to be implemented with a manual list of the fields, as in Collin's patc if we did it.

05/06/07 20:31:59 changed by Paul Smith <paulsmith@pobox.com>

  • attachment 4102-only-udpate-fields.patch added.

Patch with unit tests and docs

05/06/07 20:43:28 changed by Paul Smith <paulsmith@pobox.com>

I've uploaded a patch based on Collin's original (using only_fields instead of save_fields as the keyword argument -- this implies a possible except_fields keyword argument, or perhaps exclude=False|True which defaults to False), including unit tests and docs.

Should an exception be thrown if invalid field names are given in the list?

06/21/07 19:37:12 changed by Collin Grady <cgrady@the-magi.us>

This came up again at work and it was mentioned that you should be able to override __setattr__ for the meta class to trap attribute changes and set a dirty flag for various columns, which could enable some automation of this.

06/21/07 23:42:47 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty.patch added.

initial shot at 'dirty' flag, seems to work so far

06/22/07 19:05:52 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty.2.patch added.

updated patch to only set dirty if the new value is different than current

06/22/07 23:05:11 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty.3.patch added.

fixed another bug - _dity check died if the field didn't exist but was being assigned to

06/29/07 04:14:56 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty.4.patch added.

06/29/07 04:19:45 changed by Collin Grady <cgrady@the-magi.us>

Found another issue in the previous patches - if you created an object, for instance Poll(id=1, question='What color is the sky?') and then tried to save it, it would not recognize that you had modified those attributes.

Added a kwarg to __init__ to decide if the dirty flags should be washed at the end of __init__, and modified the model creation in query.py to use the flag, so that queries from the database are not flagged as dirty, but manual object creation can be done either way.

08/08/07 23:54:12 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty.5.patch added.

Updated patch against [5832], fixed bug with kwargs instantiation and dirty flags

08/09/07 00:24:33 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty-set.patch added.

converted patch to use set on suggestion of SmileyChris?, also found another bug when you used a kwargs of a field name instead of attname

08/09/07 17:03:07 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty-set.2.patch added.

removed keyword from __init__, added call to _wash in query.py

08/09/07 17:03:55 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty-set.3.patch added.

woops, removed old commented line

08/09/07 17:22:37 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty-set.4.patch added.

sigh, patch hates me - fixed duplicate test issue

08/09/07 17:33:08 changed by SmileyChris

  • needs_tests deleted.

This is looking nice and shiny. With some docs, it'd be ready for core IMO.

08/16/07 16:46:26 changed by SmileyChris

  • attachment 4102-dirty-set.5.patch added.

Cleaner implementation: doesn't mess with init

08/16/07 17:08:44 changed by SmileyChris

  • attachment 4102-dirty-set.6.patch added.

small fix (a del exception is an AttributeError, not a KeyError)

08/16/07 17:25:51 changed by SmileyChris

Malcolm asked about the performance impact. It doesn't look good. Here's a quick mock test I created:

>>> from timeit import Timer

# A control model
>>> setup_a = '''
class Model(object):
    pass
m = Model()
'''

# A model which has a __setattr__ which does nothing
>>> setup_b = '''
class Model(object):
    def __setattr__(self, name, value):
        super(Model, self).__setattr__(name, value)
m = Model()
'''

# A model which has a __setattr__ which matches this patch
>>> setup_c = '''
class Model(object):
    def __setattr__(self, name, value):
        if name != '_modified_attrs' and (not hasattr(self, name) or
                                          value != getattr(self, name)):
            if hasattr(self, '_modified_attrs'):
                if value not in self._modified_attrs: 
                    self._modified_attrs.add(name)
            else:
                self._modified_attrs = set((value,))
        super(Model, self).__setattr__(name, value)
m = Model()
'''

>>> Timer('m.an_attribute="test"', setup_a).timeit()
0.17214338693884201

>>> Timer('m.an_attribute="test"', setup_b).timeit()
1.6591357281431556

>>> Timer('m.an_attribute="test"', setup_c).timeit()
2.6405875607088092

08/21/07 23:24:47 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty-set.7.patch added.

small fix to the set creation - need to add name to the set, not the value

08/22/07 01:18:18 changed by Brian Rosner <brosner@gmail.com>

I would think by doing value comparisons on the data is a main cause for the overhead. To mark something dirty do it based on the setting of the attribute only and not based on what data is being changed. If I am having to update a model with a 3.4 MB chunk of data it will be comparing that data with the 3 MB that may already be in there. At that point I don't care about only updating the fields I change since it will take much more time than just updating it.

08/22/07 03:40:45 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty-set.8.patch added.

fixed another value/name mixup, optimized a few operations

08/23/07 04:55:37 changed by SmileyChris

More performance checks:

# A model which has a __setattr__ which *really* matches this patch
>>> setup_c = '''
class Model(object):
    def __setattr__(self, name, value):
        if name != '_modified_attrs':
            if hasattr(self, '_modified_attrs'):
                if name not in self._modified_attrs: 
                    self._modified_attrs.add(name)
            else:
                self._modified_attrs = set((name,))
        super(Model, self).__setattr__(name, value)
m = Model()
'''

# Optimized for speed, and dropped the value check as Brian suggested
>>> setup_d = '''
class Model(object):
    def __setattr__(self, name, value):
        try:
            if name not in self._modified_attrs: 
                self._modified_attrs.add(name)
        except AttributeError:
            if name != '_modified_attrs':
                self._modified_attrs = set((name,))
        super(Model, self).__setattr__(name, value)
m = Model()
'''

>>> Timer('m.an_attribute="test"', setup_a).timeit()
0.17163466306471914
>>> Timer('m.an_attribute="test"', setup_b).timeit()
1.7086492836380041
>>> Timer('m.an_attribute="test"', setup_c).timeit()
2.3102257409810463
>>> Timer('m.an_attribute="test"', setup_d).timeit()
1.9328342009948187

So setup_d is only 13% slower than a plain __setattr__ override... that's not bad...

08/23/07 04:58:42 changed by SmileyChris

(actually, I dropped the value check in setup_c too, so it's still more like 2.71 for the speed of a __setattr__ which matches the current patch)

08/23/07 16:07:01 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-dirty-set.9.patch added.

Fixed duplicate tests due to svn glitch again

08/23/07 16:08:09 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102-timings.py added.

perf test script

(follow-up: ↓ 13 ) 08/23/07 16:19:10 changed by Collin Grady <cgrady@the-magi.us>

Attached the earlier tests (slightly corrected) - it seems that the majority of the extra time used is merely from defining __setattr__, strangely enough...

Results of script, showing single vs double:

Single Assignment
Control:   0.29817199707
noop:      1.92452192307
patch:     3.07881999016
optimized: 2.29438686371

Double Assignment
Control:   0.298352003098
Noop:      1.92282295227
Patch:     3.08132410049
Optimized: 2.29805707932

The odd thing (to me at least) is that the set creation doesn't appear to take any more time than simply adding a value to the set (as shown by how the 2nd assignment takes just as long as the first).

If the performance difference caused by the __setattr__ stuff is too much, I think the original manual idea of being able to specify save/exclude fields in save() could be a decent compromise.

09/10/07 10:00:24 changed by anonymous

BTW the latest patch obviously is very backwards-incompatible for anyone using obj.dict.update() to update a lot of attributes at once, or using obj.dict.update(form.cleaned_data) as I've seen on a few blog posts.

10/01/07 23:30:03 changed by SmileyChris

  • attachment 4102-timings.2.py added.

new properties method, faster than noop :)

(in reply to: ↑ 11 ) 10/01/07 23:34:55 changed by SmileyChris

Replying to Collin Grady <cgrady@the-magi.us>:

The odd thing (to me at least) is that the set creation doesn't appear to take any more time than simply adding a value to the set (as shown by how the 2nd assignment takes just as long as the first).

It's because you were doing it in set up, not in the actual test!

Results of script:

Single Assignment
Control:   0.168
noop:      1.689 (10.0x)
patch:     2.581 (15.3x)
optimized: 1.872 (11.1x)
property:  1.494 (8.9x)

Double Assignment
Control:   0.368
noop:      3.400 (9.2x)
patch:     5.222 (14.2x)
optimized: 3.887 (10.6x)
property:  2.951 (8.0x)

10/02/07 21:45:56 changed by SmileyChris

  • attachment 4102-timings.3.py added.

"hyper-optimized" method, and showing the downfall of the properties method

10/02/07 21:48:21 changed by SmileyChris

Ok, so I've got the timing down to under 6x control.

I think it's also important to mention that the writing of an attribute isn't really where most of the time is going to be taken up anyway - it's not like this is slowing down model access or saving by any real measure.

Single Assignment
Control:   0.170
noop:      1.939 (11.4x)
patch:     2.566 (15.1x)
optimized: 1.852 (10.9x)
noop-h-op: 0.787 (4.6x)
h-optimiz: 0.988 (5.8x)
property:  1.473 (8.6x)
new-attr:  1.779 (10.4x)

Retreiving
Control:   0.139
noop:      0.137 (1.0x)
patch:     0.177 (1.3x)
optimized: 0.162 (1.2x)
noop-h-op: 0.136 (1.0x)
h-optimiz: 0.135 (1.0x)
property:  0.787 (5.7x)
new-attr:  0.992 (7.2x)

10/02/07 22:56:08 changed by SmileyChris

  • attachment 4102-modified.patch added.

Latest version of the patch, reimplemented checking against the existing value after a discussion with Collin because otherwise user input like forms/etc is going to mark everything modified regardless

(follow-up: ↓ 16 ) 10/09/07 20:12:08 changed by andraz@zemanta.com

Two things: 1. while you are overloading setattr, you could also add an option to allow only defined fields to be set and rising exception on setting nonexistent field. This is a great debugging mechanism to weed out incidental typos which usually lead to quiet dataloss until you notice it at some time much too late.

2. Are you really sure you should optimize for form-update when saving on-change - and thus needing a compare? I don't really believe that is the case which people need this db performance advantage for... Usually save() gets performance-problematic when you want to change your data in bulk... and you just need to update a counter or a date field for thousands of records, but don't want to touch all the textfields

(in reply to: ↑ 15 ) 10/09/07 20:23:01 changed by SmileyChris

Replying to andraz@zemanta.com:

Two things: 1. while you are overloading setattr, you could also add an option to allow only defined fields to be set and rising exception on setting nonexistent field.

That is well outside of the scope of this ticket.

2. Are you really sure you should optimize for form-update when saving on-change - and thus needing a compare?

You're not being very clear, but I think you are asking why I re-implemented the change to check against existing values.

The point in that change isn't optimization but to help with the minor threading issues. And I think you may have overlooked the fact that it only does this comparison for fields when they are being set to ensure they really are "changing".

11/27/07 04:24:00 changed by Yopi <yannvr@gmail.com>

  • cc set to yannvr@gmail.com.

12/11/07 03:48:54 changed by Simon Litchfield <simon@quo.com.au>

  • cc changed from yannvr@gmail.com to yannvr@gmail.com, simon@quo.com.au.

Why not just set self.db_values = db_values under save() and get() etc? Then compare current values to db_values during the loop that compiles the INSERT/UPDATEs?

12/11/07 04:00:12 changed by Collin Grady <cgrady@the-magi.us>

It would likely be less efficient to store an entire second copy of all model data when you can simply store a modified flag. Especially if you had a large number of fields or fields with large amounts of data in them.

12/11/07 04:02:26 changed by Simon Litchfield <simon@quo.com.au>

Sorry just read Brian's point, re the big blob data scenario. Agreed setattr overloading does sound better in that light, +1 :-)

Another point. Rather than suggest it should be kept _private, why not make it clearly available and show in the docs. It will be very useful for deciding when to update related tables or not, during save() overrides etc.

See this is ugly --

def save(self):
  if not self.id:
    retag = True
  else:
    # Unfortunately, we need to hit the db to see if things have changed
    check = Tag.objects.get(pk=self.id)
    retag = (check.name != self.name)
  super(Tag, self).save()
  if retag:
    self.retag_articles()

Be great to be able to do something like this --

def save(self):
  retag = ('name' in self.modified_fields)
  super(Tag, self).save()
  if retag:
    self.retag_articles()

12/11/07 04:08:12 changed by Collin Grady <cgrady@the-magi.us>

I don't see how anything stops you from reading it like that already with this patch, it isn't a __ var, and you would then be internal to the model anyway, so it seems fine to me to read from it as is :)

12/12/07 21:10:24 changed by Andrew Badr <andrewbadr.etc@gmail.com>

  • cc changed from yannvr@gmail.com, simon@quo.com.au to yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com.

12/18/07 03:21:58 changed by SmileyChris

  • needs_better_patch set to 1.

Need to write a new patch

(22:09:40) Magus-: SmileyChris: oh, btw, your most recent patch for 4102 is busted, I forgot to talk to you about it earlier :)
(22:09:50) SmileyChris: oh noes!
(22:09:56) SmileyChris: Magus-: what did I bust?
(22:09:59) Magus-: you can't use self.__dict__ like that
(22:10:03) Magus-: it breaks on fkeys and other things
(22:10:09) Magus-: well, only fkeys so far

01/05/08 20:30:43 changed by ferringb@gmail.com

  • cc changed from yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com to yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com.

02/12/08 12:20:17 changed by dcramer

  • cc changed from yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com to yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com, dcramer@gmail.com.

02/15/08 00:10:22 changed by Collin Grady <cgrady@the-magi.us>

  • attachment 4102.patch added.

fixed glitch with foreignkeys

03/21/08 21:22:41 changed by PhiR

  • cc changed from yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com, dcramer@gmail.com to yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com, dcramer@gmail.com, philippe.r@oult.eu.

03/26/08 11:01:07 changed by guettli

  • cc changed from yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com, dcramer@gmail.com, philippe.r@oult.eu to yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com, dcramer@gmail.com, philippe.r@oult.eu, hv@tbz-pariv.de.

05/20/08 12:25:59 changed by cgrady

  • needs_better_patch deleted.

05/20/08 17:41:53 changed by cgrady

  • attachment 4102.2.patch added.

post-qsrf patch

06/14/08 16:30:22 changed by Simon Willison

You've obviously spent a lot of time investigating the performance overhead incurred by over-riding setattr, but have you done any benchmarking of the performance improvement gained by performing significantly smaller UPDATE statements? It may well be that the improvement from the more efficient UPDATEs makes the performance overhead of setattr irrelevant.

06/20/08 12:06:23 changed by jashugan

  • cc changed from yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com, dcramer@gmail.com, philippe.r@oult.eu, hv@tbz-pariv.de to yannvr@gmail.com, simon@quo.com.au, andrewbadr.etc@gmail.com, ferringb@gmail.com, dcramer@gmail.com, philippe.r@oult.eu, hv@tbz-pariv.de, jashugan@gmail.com.

Add/Change #4102 (Allow UPDATE of only specific fields in model.save())




Change Properties
Action