Opened 12 years ago
Last modified 3 years ago
#21454 new New feature
Ignoring certain fields on INSERT and UPDATE queries
| Reported by: | Apostolis Bessas | Owned by: | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | oracle |
| Cc: | mmanfre@…, Shai Berger, me@…, loneowais@…, Leo Antunes, Petr Přikryl, Michael Rosner | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | yes |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
There are cases that you would like to let the database handle certain fields. For instance,
- Set fields to their default value at the database-level (such as a
createdtimestamp). - Have the value of fields computed always by the database (e.g. with a trigger).
- Have virtual fields in the database etc.
The first case is especially important for timestamps, since setting the default to NOW() will make sure (at least in postgres) that all rows inserted under the same transaction will have the same timestamp (compared to how auto_now_add works). This means that, when you insert a lot of objects in one transaction, they will all have the exact same timestamp. Moreover, relying on the database for setting default values to fields reduces the amount of data that need to be sent to
the database server.
The second and third case allow to move some of the logic to the database and take advantage of some features they offer as well as speed up some things.
In order to achieve this (without resorting to hacks, like ignoring the values that django sends in the database), django should ignore these fields, when constructing an INSERT or UPDATE query. This could be done by allowing a field to tell django to ignore it on INSERT or UPDATE queries, the same way it does now for AutoFields.
You can find an implementation of this approach at https://github.com/mpessas/django/commit/a663d8f95d9cca5112ccd33e3a303cae82908b60 and a demo project at https://github.com/mpessas/dbfield (timestamp app).
Change History (32)
comment:1 by , 12 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
follow-up: 3 comment:2 by , 12 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.
Validation code is needed to check that primary keys aren't use_on_insert/use_on_update.
Should there be a way to override the database assignment? How about .update(modified_field=somevalue)? The simplest solution seems to be to not allow override in .save(), but allow .update() of db default fields
Fixtures loading should probably force-insert/update the fixture's value, not use database default. This is doable by checking that if raw=True in save, then insert everything as-is and don't do auto-update.
In addition, you will need also ModelForms support.
In general I see this feature as useful when you need it, so marking as accepted.
follow-up: 4 comment:3 by , 12 years ago
Replying to akaariai:
It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.
I would rather avoid the extra query to re-read the db-default values; it would have a significant impact on performance, when one of the goals was to improve performance in the first place.
Regarding the RETURNING clause, it is not part of the SQL standard AFAIK (see http://www.postgresql.org/docs/9.1/static/sql-update.html#AEN76402) and that's the reason I avoided it. That said, can we have backend-specific codepaths in Django? That is, use the RETURNING clause, when available, and fallback to a re-read query otherwise.
In any case, I suppose we should allow the developer to skip the re-read query, if using the RETURNING clause is not possible. How does that sound?
Validation code is needed to check that primary keys aren't use_on_insert/use_on_update.
Done at https://github.com/mpessas/django/commit/eda1b860311496d37bd6609b710f69fa4ae6fc41.
Should there be a way to override the database assignment? How about .update(modified_field=somevalue)? The simplest solution seems to be to not allow override in .save(), but allow .update() of db default fields
Yes, that was my idea as well. The .update() method works as expected (see https://github.com/mpessas/django/commit/4fcfed999e82b08182b6c614553e7d707fdf4a6d).
Fixtures loading should probably force-insert/update the fixture's value, not use database default. This is doable by checking that if raw=True in save, then insert everything as-is and don't do auto-update.
Fixed at https://github.com/mpessas/django/commit/7540af0471346d62e60f1c61c27638b3233d4d5b. However, I am not familiar with the process and as a result confident with the commit; are we sure that the raw flag is used only for fixture loading? Should I research a bit more?
In addition, you will need also ModelForms support.
I suppose you mean that the fields should be excluded by default in ModelForms.
I can think of three ways to do it:
- Do it in
fields_for_model, that is, exclude alluse_on_insert/use_on_updatefields, unless they are explicitly included in thefieldsvariable of the form. The big issue here is that you cannot tell, whether the form will be used to create or update a row in order to handleuse_on_insertanduse_on_updatefields differently; you have to exclude both. See commit https://github.com/mpessas/django/commit/3d298a6f269e28aa77e290aa4299cb17e3a6185b. - Do it in the form's initializer. In this case, you do have access to the
instancevariable; thus, you can tell whether the form is going to create or update a row. But this would not be consistent; the developer would get a different set of fields in the two cases. Moreover, the field selection process is part of the metaclass; as a result, the code to exclude the db-default fields belongs to the metaclass as well. - Do it in the
__call__method of the metaclass. This keeps all field-selection code in the metaclass, but still has the problem of consistency.
I am not sure which choice is best. The first one seems simpler to me (from the point of view of the user of a ModelForm), since it avoids the inconsistency issue (which would also leak to templates etc). Any opinions here?
In general I see this feature as useful when you need it, so marking as accepted.
Thanks!
follow-up: 5 comment:4 by , 12 years ago
Replying to mpessas:
Replying to akaariai:
It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.
I would rather avoid the extra query to re-read the db-default values; it would have a significant impact on performance, when one of the goals was to improve performance in the first place.
Regarding the
RETURNINGclause, it is not part of the SQL standard AFAIK (see http://www.postgresql.org/docs/9.1/static/sql-update.html#AEN76402) and that's the reason I avoided it. That said, can we have backend-specific codepaths in Django? That is, use theRETURNINGclause, when available, and fallback to a re-read query otherwise.
In any case, I suppose we should allow the developer to skip the re-read query, if using the
RETURNINGclause is not possible. How does that sound?
Check commit https://github.com/mpessas/django/commit/dac93937c66040cb901e5e48bd6deba5ce51de9e. If the backend supports the RETURNING clause, we use that to fetch the values the database has given to the db-default fields. Otherwise, an extra SELECT query is issued. If the user does not want this overhead, he can set force_fetch=False (which is the default).
There are two main issues here: I have not checked yet how this works, when Model inheritance is used and the Oracle backend is broken.
Regarding the latter, I have not found any helpful docs about the RETURNING clause and I cannot test it either. Is there someone I could ask or should I contact the cx_oracle developers directly?
follow-up: 9 comment:5 by , 12 years ago
Replying to mpessas:
Check commit https://github.com/mpessas/django/commit/dac93937c66040cb901e5e48bd6deba5ce51de9e. If the backend supports the RETURNING clause, we use that to fetch the values the database has given to the db-default fields. Otherwise, an extra SELECT query is issued. If the user does not want this overhead, he can set
force_fetch=False(which is the default).
There are two main issues here: I have not checked yet how this works, when Model inheritance is used and the Oracle backend is broken.
Regarding the latter, I have not found any helpful docs about the RETURNING clause and I cannot test it either. Is there someone I could ask or should I contact the cx_oracle developers directly?
.
It's possible that a database backend is capable of returning the IDs from an insert without supporting the RETURNING clause. Those should be separate features. I previously injected a select statement to return the inserted ID without needing a second round trip to the MSSQL server. It's also possible that certain database drivers may return newly inserted IDs without SQL.
MSSQL supports the OUTPUT clause, which is similar to RETURNING, but not exactly the same. Once I add Django 1.7 support to django-mssql, I'll test out your branch. The primary database I develop django-mssql against is a heavy user of readonly fields (computed fields, populated with triggers/SSIS, etc). We've made due with a Manager that lets me define the readonly fields and having to manually refresh the instance after certain inserts.
comment:6 by , 12 years ago
This ticket seems to only address fields that are ALWAYS updated by a trigger, but not fields that MAY be updated by a trigger (or something else working in/against the database), but doesn't necessarily require blocking changes on either insert or update.
This scenario seems like it should be covered by this ticket too. If so, another field argument always_refresh could handle that. When always_refresh=True, it would instruct the ORM to fetch the field after an insert or update with either use of a RETURNING clause or a separate fetch. When always_refresh=False, it would have no action, unless use_on_insert=False or use_on_update=False.
A third field argument feels excessive, especially since you'd need to define all three for the common case of a read-only field. The use_on_insert and use_on_update arguments could get merged in to an edit_behavior argument (probably needs a better name) that expects one of the following constant values:
DEFAULT - use_on_insert == True and use_on_update == True
CREATE_ONLY - use_on_insert == True and use_on_update == False
UPDATE_ONLY - use_on_insert == False and use_on_update == True
READ_ONLY - use_on_insert == False and use_on_update == False
comment:7 by , 12 years ago
| Cc: | added |
|---|
follow-up: 10 comment:8 by , 12 years ago
| Cc: | added |
|---|---|
| Keywords: | oracle added |
@mpessas: With all things Oracle, you can always bug me. I only had time to glimpse at your patch now, but I will happily work with you to make sure that, by the time it is committed, Oracle is not broken.
comment:9 by , 12 years ago
Replying to manfre:
It's possible that a database backend is capable of returning the IDs from an insert without supporting the
RETURNINGclause. Those should be separate features. I previously injected a select statement to return the inserted ID without needing a second round trip to the MSSQL server. It's also possible that certain database drivers may return newly inserted IDs without SQL.
True, this has been implemented at https://github.com/mpessas/django/commit/c7ca041347b9c8b24285cb0c94ba3eefeded707e.
Replying to manfre:
The use_on_insert and use_on_update arguments could get merged in to an edit_behavior argument (probably needs a better name) that expects one of the following constant values:
DEFAULT - use_on_insert == True and use_on_update == True
CREATE_ONLY - use_on_insert == True and use_on_update == False
UPDATE_ONLY - use_on_insert == False and use_on_update == True
READ_ONLY - use_on_insert == False and use_on_update == False
That is a good idea, see commit https://github.com/mpessas/django/commit/a8a19b65774383bc47d3fb04ca6966ba4a4cba94 (a better name is welcome).
The always_refresh feature sounds interesting and I will take a look; it should not be that hard.
Keep in mind, by the way, that as far as I can tell and at least in PostgreSQL the RETURNING clause does not "work" with an after trigger; that is, if a field is changed by an AFTER INSERT or AFTER UPDATE trigger will not be returned with a RETURNING clause.
comment:10 by , 12 years ago
Replying to shai:
@mpessas: With all things Oracle, you can always bug me. I only had time to glimpse at your patch now, but I will happily work with you to make sure that, by the time it is committed, Oracle is not broken.
Great, thanks! Right now the Oracle backend uses a second query to read any updated values, even though it supports the RETURNING INTO clause. However, I am not that familiar with Oracle to be sure how to make this work (or test anything).
As far as I can tell, Oracle needs to create "bind" variables with a given type. Would it work, if I used a list of Fields instead of names (right now the list of returning values is a list of field names)? That way, I guess I should be (somehow, maybe via the db_type property) able to figure out the type of each field in the returning values list. How does this sound? I could work on this, but like I said I have no way to actually test any changes.
By the way, you can email me directly, if we need to coordinate this; it is a gmail account with this username.
comment:11 by , 12 years ago
I commented on the PR about schema editing and nulls-vs-empty-string, both in tests. I also started working on the fixes needed for the Oracle backend. It's a little more work than I thought, including a need to define, for each of the built-in field types, the appropriate cx_Oracle data type -- which we were able to avoid so far.
In turn, this means that, for proper testing, we'll probably need to add tests that return each type of field.
comment:12 by , 12 years ago
The issues have been fixed so far. If I can help with writing the tests, let me know.
Just for reference, the two issues I have not looked so far are how the changes work with model inheritance (probably there are no more changes needed) and the always_refresh option described above.
comment:14 by , 12 years ago
I just took a look at model inheritance and _save_parents calls _save_table as well, so everything should work in this use-case.
comment:15 by , 12 years ago
#19527, dealing with returned ids in bulk_create() is related, at least as far as Oracle is concerned.
comment:16 by , 12 years ago
I made some comments on the PR (I don't think it is properly linked here -- https://github.com/django/django/pull/2149 -- now it is). Additionally, I added a commit on a branch based on yours that makes Oracle pass all the model_fields and model_forms tests, at https://github.com/shaib/django/tree/auto_fields (sorry, I don't have time right now to run the full suite. Maybe later today).
W.r.t to the tests failing because Oracle returns null strings as empty, I just changed the relevant assertIsNone calls to assertFalse. Hope that is ok too.
comment:17 by , 12 years ago
| Patch needs improvement: | set |
|---|
comment:18 by , 10 years ago
| Cc: | added |
|---|
comment:19 by , 10 years ago
| Cc: | added |
|---|
comment:20 by , 10 years ago
I am continuing work on this as part of https://github.com/django/django/pull/5904. My intention is to only support backends that actually support this feature i.e, Postgresql using RETURNING and Oracle using RETURNING INTO. I can follow up with another PR to "simulate" the behavior on backends that don't support it. I've yet to write any docs or test cases.
The changes introduce 3 new field options.
return_on_insert
When set to True, django will load the value from the database column along with the insert statement and set it on the model. This is a noop on mysql and sqlite.
return_on_update
When set to True, django will load the value from the database column along with the update statement and set it on the model. This is a noop on mysql and sqlite.
delegated
When set to True, django completely skips this field from insert and update operations and let's the database set a value. It also sets return_on_update and return_on_insert to True if no explicit value is passed for them.
I think the combination of these three options covers almost every option except when all the fields are delegated. In this case, the update statement is never executed as UPDATE statement does not work without SET clause and we can't intelligently decide whether to use the existing value in update or not.
One possible way to reduce the impact of edge case would be to add another field called delegate_if_None which would delegate the field to database only when current value of the field is set to None. I'm not sure if such a rare case should be supported though. It can easily by solved by setting return_on_insert and return_on_update to True and then setting default value or adding logic in database triggers.
comment:21 by , 10 years ago
Update
Moved the delegation logic from models closer to the code that generates the queries. This makes the implementation a bit more robust as most higher level APIs should just work without the need to update anything. It also allowed me to add support to querysets for delegation.
Field Options
delegated=True|False
A shortcut that automatically sets all of the following options to True when no value is provided for them.
delegated_on_insert=True|False
If True, the value of this field will not be skipped from the INSERT statement completely. This automatically sets return_on_insert to True if not value is given. Supported on all backends.
delegated_on_update=True|False
If True, the value of this field will not be skipped from the UPDATE statement completely. This automatically sets return_on_update to True if not value is given. Supported on all backends.
return_on_insert=True|False
If True, the value for field is loaded and assigned from the database along with the INSERT statement. Supported on PostgreSQL using the RETURNING clause and Oracle using the RETURNING INTO clause.
return_on_update=True|False
If True, the value for field is loaded and assigned from the database along with the UPDATE statement. Supported on PostgreSQL using the RETURNING clause and Oracle using the RETURNING INTO clause.
Model API Changes
save() method now takes an optional argument called ignore_delegated which can be a list of delegated field names. This negates the behavior of delegated_on_insert and delegated_on_update.
Queryset/Manager API Changes
Querysets have a new method call ignore_delegated which takes names of delegated fields as arguments and negates the behavior of delegated_on_insert and delegated_on_update for any subsequent operations. For example, if we have a delegated field called created_at on a model, we can temporarily disable delegation to DB for a single query.
class MyModel(models.Model):
pytthon_ts = fields.DatetimeField()
db_ts = fields.DatetimeField(delegated=True)
# This will only update python_ts as `db_ts` is delegated to database
MyModel.objects.update(db_ts=now(), python_ts=now())
# However, this will update both
MyModel.objects.ignore_delegated('db_ts').update(db_ts=now(), python_on=now())
The above example also applies to Model.objects.create().
comment:22 by , 10 years ago
| Needs documentation: | set |
|---|---|
| Owner: | changed from to |
comment:23 by , 10 years ago
| Patch needs improvement: | unset |
|---|
I've removed 'patch needs improvement' flag for now as we have a new patch and it needs review. Not sure if this is the right way to indicate that an issue needs a review.
comment:24 by , 9 years ago
| Owner: | changed from to |
|---|
comment:25 by , 9 years ago
#27138 Does not seem related to this bug at all. Are you sure you added your comment to (and took ownership of) the correct ticket?
comment:26 by , 9 years ago
Oh, I meant #27446. I took ownership of the correct ticket I just referenced the bad one, sorry '
comment:27 by , 9 years ago
We posted a new patch for this at https://github.com/django/django/pull/7515
Happy reviewing :D
comment:28 by , 7 years ago
Here is a Postgres-specific solution that doesn't change Django, for anybody who has been patiently waiting..
from psycopg2.extensions import register_adapter, AsIs
from django.db import models, connection
class PostgresDefaultValueType:
pass
class DelegatedCharField(models.CharField):
def get_prep_value(self, value):
return value or PostgresDefaultValueType()
register_adapter(
PostgresDefaultValueType, lambda _: AsIs('DEFAULT'))
class Widget(models.Model):
name = models.CharField(max_length=255)
dbnow = DelegatedCharField(max_length=255, null=True)
with connection.cursor() as cursor:
cursor.execute('alter table widget alter column dbnow add default now()')
Widget.objects.create(name="foo")
With the above magical goodness, Django will send this to Postgres:
insert into widget (name, dbnow) values ('foo', DEFAULT)
instead of
insert into widget (name, dbnow) values ('foo', NULL)
And.... did you know that that will delegate that field to Postgres's default automatically?! I didn't, until yesterday!
comment:29 by , 5 years ago
| Cc: | added |
|---|
comment:30 by , 5 years ago
| Cc: | added |
|---|
comment:31 by , 4 years ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
comment:32 by , 3 years ago
| Cc: | added |
|---|
Some initial comments can be found at https://groups.google.com/forum/#!topic/django-developers/rwP2DcPYHEs.
I agree with making
use_on_insertanduse_on_updateinstance variables instead, but my main concern is that this probably makes the variables as important as the rest of the aguments the field can take. However, I suppose most people should stay away from them. That is why I wanted to make them a bit more internal to the field class. But I guess this is actually a matter of how you document things. If the patch is merged, should there be documentation in the Field reference page?By the way, there is one thing I forgot to mention in my initial email, that since the value of a field is not created from within django, the object created or updated will not have the database-assigned value. For instance, if there is a
createdfield that takes its value from the database, django will know that the value of the field isNoneand you will need to read the model instance again to fetch the correct values. But I am sure that people who need this would not mind. In any case, this is another reason why I think this should be more of an internal thing (at least as far as documentation is concerned).Regarding
save, yes this is a way to achieve a similar effect now, but only for UPDATEs.