Determine UPDATE or INSERT without running a SELECT statement
|Reported by:||Owned by:||nobody|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Right now, every time one calls save() on a model that has a primary key set, Django must execute two SQL statements: a SELECT to see if an item exists in the database with that primary key and then the appropriate UPDATE or INSERT. So, every UPDATE operation becomes a two SQL call process. For most purposes, it probably isn't a big deal, but I like efficiency.
Clearly, you can't just take the presence/absence of a primary key to determine whether you should update/insert. But Rails actually determines it without doing a select statement. Basically, each ActiveRecord object in Rails has a boolean new_record flag that tells you whether it's a new record or not. If new_record is true, insert, else update.
So, basically, it could work like this:
>>>p = Person.objects.get(pk=23) >>>p._new_record False >>>p2 = Person(name="Sean", fav_ice_cream="Mint") >>>p2._new_record True
This wouldn't be a small change and *would* change one part of the API. Currently the docs read: "The one gotcha here is that you should be careful not to specify a primary-key value explicitly when saving new objects, if you cannot guarantee the primary-key value is unused." This would change it so that if you specified a pk that was a duplicate, it would be caught when obj._new_record = True and it tried an insert and got a duplicate key error. Currently, the system silently updates whatever has the duplicate pk.
While this would change the behavior, the documentation on it seems to indicate that it would be desirable for an error to occur with many warnings that you'll get an unintended update rather than an error should you duplicate a key.
So, this change would save a SQL call for all UPDATES and correct what the documentation indicates is a bad situation that should (but doesn't) give you a duplicate key error. It doesn't look like a small change though. save_base() would need to be changed in db/models/base.py and the _new_record attribute would need to be properly set based on whether the object was initialized from the database or not. Someone else should probably comment on the feasibility of the code changes.
Relavant documentation sections:
P.S. If one wanted to maintain backward compatibility, it could be implemented so that it uses the _new_record boolean to determine update/insert, but if an insert gives a duplicate key error, it then just runs the update. Since this case shouldn't be the norm, it would still cut down on a lot of SQL executions.