Opened 11 years ago
Last modified 2 years ago
#21454 new New feature
Ignoring certain fields on INSERT and UPDATE queries
Reported by: | Apostolis Bessas | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | oracle |
Cc: | mmanfre@…, Shai Berger, me@…, loneowais@…, Leo Antunes, Petr Přikryl, Michael Rosner | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
There are cases that you would like to let the database handle certain fields. For instance,
- Set fields to their default value at the database-level (such as a
created
timestamp). - Have the value of fields computed always by the database (e.g. with a trigger).
- Have virtual fields in the database etc.
The first case is especially important for timestamps, since setting the default to NOW()
will make sure (at least in postgres) that all rows inserted under the same transaction will have the same timestamp (compared to how auto_now_add works). This means that, when you insert a lot of objects in one transaction, they will all have the exact same timestamp. Moreover, relying on the database for setting default values to fields reduces the amount of data that need to be sent to
the database server.
The second and third case allow to move some of the logic to the database and take advantage of some features they offer as well as speed up some things.
In order to achieve this (without resorting to hacks, like ignoring the values that django sends in the database), django should ignore these fields, when constructing an INSERT or UPDATE query. This could be done by allowing a field to tell django to ignore it on INSERT or UPDATE queries, the same way it does now for AutoFields.
You can find an implementation of this approach at https://github.com/mpessas/django/commit/a663d8f95d9cca5112ccd33e3a303cae82908b60 and a demo project at https://github.com/mpessas/dbfield (timestamp app).
Change History (32)
comment:1 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
follow-up: 3 comment:2 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.
Validation code is needed to check that primary keys aren't use_on_insert/use_on_update.
Should there be a way to override the database assignment? How about .update(modified_field=somevalue)? The simplest solution seems to be to not allow override in .save(), but allow .update() of db default fields
Fixtures loading should probably force-insert/update the fixture's value, not use database default. This is doable by checking that if raw=True in save, then insert everything as-is and don't do auto-update.
In addition, you will need also ModelForms support.
In general I see this feature as useful when you need it, so marking as accepted.
follow-up: 4 comment:3 by , 11 years ago
Replying to akaariai:
It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.
I would rather avoid the extra query to re-read the db-default values; it would have a significant impact on performance, when one of the goals was to improve performance in the first place.
Regarding the RETURNING
clause, it is not part of the SQL standard AFAIK (see http://www.postgresql.org/docs/9.1/static/sql-update.html#AEN76402) and that's the reason I avoided it. That said, can we have backend-specific codepaths in Django? That is, use the RETURNING
clause, when available, and fallback to a re-read query otherwise.
In any case, I suppose we should allow the developer to skip the re-read query, if using the RETURNING
clause is not possible. How does that sound?
Validation code is needed to check that primary keys aren't use_on_insert/use_on_update.
Done at https://github.com/mpessas/django/commit/eda1b860311496d37bd6609b710f69fa4ae6fc41.
Should there be a way to override the database assignment? How about .update(modified_field=somevalue)? The simplest solution seems to be to not allow override in .save(), but allow .update() of db default fields
Yes, that was my idea as well. The .update()
method works as expected (see https://github.com/mpessas/django/commit/4fcfed999e82b08182b6c614553e7d707fdf4a6d).
Fixtures loading should probably force-insert/update the fixture's value, not use database default. This is doable by checking that if raw=True in save, then insert everything as-is and don't do auto-update.
Fixed at https://github.com/mpessas/django/commit/7540af0471346d62e60f1c61c27638b3233d4d5b. However, I am not familiar with the process and as a result confident with the commit; are we sure that the raw
flag is used only for fixture loading? Should I research a bit more?
In addition, you will need also ModelForms support.
I suppose you mean that the fields should be excluded by default in ModelForms.
I can think of three ways to do it:
- Do it in
fields_for_model
, that is, exclude alluse_on_insert
/use_on_update
fields, unless they are explicitly included in thefields
variable of the form. The big issue here is that you cannot tell, whether the form will be used to create or update a row in order to handleuse_on_insert
anduse_on_update
fields differently; you have to exclude both. See commit https://github.com/mpessas/django/commit/3d298a6f269e28aa77e290aa4299cb17e3a6185b. - Do it in the form's initializer. In this case, you do have access to the
instance
variable; thus, you can tell whether the form is going to create or update a row. But this would not be consistent; the developer would get a different set of fields in the two cases. Moreover, the field selection process is part of the metaclass; as a result, the code to exclude the db-default fields belongs to the metaclass as well. - Do it in the
__call__
method of the metaclass. This keeps all field-selection code in the metaclass, but still has the problem of consistency.
I am not sure which choice is best. The first one seems simpler to me (from the point of view of the user of a ModelForm), since it avoids the inconsistency issue (which would also leak to templates etc). Any opinions here?
In general I see this feature as useful when you need it, so marking as accepted.
Thanks!
follow-up: 5 comment:4 by , 11 years ago
Replying to mpessas:
Replying to akaariai:
It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.
I would rather avoid the extra query to re-read the db-default values; it would have a significant impact on performance, when one of the goals was to improve performance in the first place.
Regarding the
RETURNING
clause, it is not part of the SQL standard AFAIK (see http://www.postgresql.org/docs/9.1/static/sql-update.html#AEN76402) and that's the reason I avoided it. That said, can we have backend-specific codepaths in Django? That is, use theRETURNING
clause, when available, and fallback to a re-read query otherwise.
In any case, I suppose we should allow the developer to skip the re-read query, if using the
RETURNING
clause is not possible. How does that sound?
Check commit https://github.com/mpessas/django/commit/dac93937c66040cb901e5e48bd6deba5ce51de9e. If the backend supports the RETURNING clause, we use that to fetch the values the database has given to the db-default fields. Otherwise, an extra SELECT query is issued. If the user does not want this overhead, he can set force_fetch=False
(which is the default).
There are two main issues here: I have not checked yet how this works, when Model inheritance is used and the Oracle backend is broken.
Regarding the latter, I have not found any helpful docs about the RETURNING clause and I cannot test it either. Is there someone I could ask or should I contact the cx_oracle developers directly?
follow-up: 9 comment:5 by , 11 years ago
Replying to mpessas:
Check commit https://github.com/mpessas/django/commit/dac93937c66040cb901e5e48bd6deba5ce51de9e. If the backend supports the RETURNING clause, we use that to fetch the values the database has given to the db-default fields. Otherwise, an extra SELECT query is issued. If the user does not want this overhead, he can set
force_fetch=False
(which is the default).
There are two main issues here: I have not checked yet how this works, when Model inheritance is used and the Oracle backend is broken.
Regarding the latter, I have not found any helpful docs about the RETURNING clause and I cannot test it either. Is there someone I could ask or should I contact the cx_oracle developers directly?
.
It's possible that a database backend is capable of returning the IDs from an insert without supporting the RETURNING
clause. Those should be separate features. I previously injected a select statement to return the inserted ID without needing a second round trip to the MSSQL server. It's also possible that certain database drivers may return newly inserted IDs without SQL.
MSSQL supports the OUTPUT
clause, which is similar to RETURNING
, but not exactly the same. Once I add Django 1.7 support to django-mssql, I'll test out your branch. The primary database I develop django-mssql against is a heavy user of readonly fields (computed fields, populated with triggers/SSIS, etc). We've made due with a Manager that lets me define the readonly fields and having to manually refresh the instance after certain inserts.
comment:6 by , 11 years ago
This ticket seems to only address fields that are ALWAYS updated by a trigger, but not fields that MAY be updated by a trigger (or something else working in/against the database), but doesn't necessarily require blocking changes on either insert or update.
This scenario seems like it should be covered by this ticket too. If so, another field argument always_refresh
could handle that. When always_refresh=True
, it would instruct the ORM to fetch the field after an insert or update with either use of a RETURNING
clause or a separate fetch. When always_refresh=False
, it would have no action, unless use_on_insert=False
or use_on_update=False
.
A third field argument feels excessive, especially since you'd need to define all three for the common case of a read-only field. The use_on_insert
and use_on_update
arguments could get merged in to an edit_behavior
argument (probably needs a better name) that expects one of the following constant values:
DEFAULT - use_on_insert == True and use_on_update == True
CREATE_ONLY - use_on_insert == True and use_on_update == False
UPDATE_ONLY - use_on_insert == False and use_on_update == True
READ_ONLY - use_on_insert == False and use_on_update == False
comment:7 by , 11 years ago
Cc: | added |
---|
follow-up: 10 comment:8 by , 11 years ago
Cc: | added |
---|---|
Keywords: | oracle added |
@mpessas: With all things Oracle, you can always bug me. I only had time to glimpse at your patch now, but I will happily work with you to make sure that, by the time it is committed, Oracle is not broken.
comment:9 by , 11 years ago
Replying to manfre:
It's possible that a database backend is capable of returning the IDs from an insert without supporting the
RETURNING
clause. Those should be separate features. I previously injected a select statement to return the inserted ID without needing a second round trip to the MSSQL server. It's also possible that certain database drivers may return newly inserted IDs without SQL.
True, this has been implemented at https://github.com/mpessas/django/commit/c7ca041347b9c8b24285cb0c94ba3eefeded707e.
Replying to manfre:
The use_on_insert and use_on_update arguments could get merged in to an edit_behavior argument (probably needs a better name) that expects one of the following constant values:
DEFAULT - use_on_insert == True and use_on_update == True
CREATE_ONLY - use_on_insert == True and use_on_update == False
UPDATE_ONLY - use_on_insert == False and use_on_update == True
READ_ONLY - use_on_insert == False and use_on_update == False
That is a good idea, see commit https://github.com/mpessas/django/commit/a8a19b65774383bc47d3fb04ca6966ba4a4cba94 (a better name is welcome).
The always_refresh
feature sounds interesting and I will take a look; it should not be that hard.
Keep in mind, by the way, that as far as I can tell and at least in PostgreSQL the RETURNING
clause does not "work" with an after trigger; that is, if a field is changed by an AFTER INSERT
or AFTER UPDATE
trigger will not be returned with a RETURNING
clause.
comment:10 by , 11 years ago
Replying to shai:
@mpessas: With all things Oracle, you can always bug me. I only had time to glimpse at your patch now, but I will happily work with you to make sure that, by the time it is committed, Oracle is not broken.
Great, thanks! Right now the Oracle backend uses a second query to read any updated values, even though it supports the RETURNING INTO
clause. However, I am not that familiar with Oracle to be sure how to make this work (or test anything).
As far as I can tell, Oracle needs to create "bind" variables with a given type. Would it work, if I used a list of Field
s instead of names (right now the list of returning values is a list of field names)? That way, I guess I should be (somehow, maybe via the db_type
property) able to figure out the type of each field in the returning values list. How does this sound? I could work on this, but like I said I have no way to actually test any changes.
By the way, you can email me directly, if we need to coordinate this; it is a gmail account with this username.
comment:11 by , 11 years ago
I commented on the PR about schema editing and nulls-vs-empty-string, both in tests. I also started working on the fixes needed for the Oracle backend. It's a little more work than I thought, including a need to define, for each of the built-in field types, the appropriate cx_Oracle data type -- which we were able to avoid so far.
In turn, this means that, for proper testing, we'll probably need to add tests that return each type of field.
comment:12 by , 11 years ago
The issues have been fixed so far. If I can help with writing the tests, let me know.
Just for reference, the two issues I have not looked so far are how the changes work with model inheritance (probably there are no more changes needed) and the always_refresh
option described above.
comment:14 by , 11 years ago
I just took a look at model inheritance and _save_parents
calls _save_table
as well, so everything should work in this use-case.
comment:15 by , 11 years ago
#19527, dealing with returned ids in bulk_create()
is related, at least as far as Oracle is concerned.
comment:16 by , 11 years ago
I made some comments on the PR (I don't think it is properly linked here -- https://github.com/django/django/pull/2149 -- now it is). Additionally, I added a commit on a branch based on yours that makes Oracle pass all the model_fields and model_forms tests, at https://github.com/shaib/django/tree/auto_fields (sorry, I don't have time right now to run the full suite. Maybe later today).
W.r.t to the tests failing because Oracle returns null strings as empty, I just changed the relevant assertIsNone
calls to assertFalse
. Hope that is ok too.
comment:17 by , 11 years ago
Patch needs improvement: | set |
---|
comment:18 by , 9 years ago
Cc: | added |
---|
comment:19 by , 9 years ago
Cc: | added |
---|
comment:20 by , 9 years ago
I am continuing work on this as part of https://github.com/django/django/pull/5904. My intention is to only support backends that actually support this feature i.e, Postgresql using RETURNING
and Oracle using RETURNING INTO
. I can follow up with another PR to "simulate" the behavior on backends that don't support it. I've yet to write any docs or test cases.
The changes introduce 3 new field options.
return_on_insert
When set to True
, django will load the value from the database column along with the insert statement and set it on the model. This is a noop on mysql and sqlite.
return_on_update
When set to True
, django will load the value from the database column along with the update statement and set it on the model. This is a noop on mysql and sqlite.
delegated
When set to True
, django completely skips this field from insert and update operations and let's the database set a value. It also sets return_on_update
and return_on_insert
to True
if no explicit value is passed for them.
I think the combination of these three options covers almost every option except when all the fields are delegated. In this case, the update statement is never executed as UPDATE
statement does not work without SET
clause and we can't intelligently decide whether to use the existing value in update or not.
One possible way to reduce the impact of edge case would be to add another field called delegate_if_None
which would delegate the field to database only when current value of the field is set to None. I'm not sure if such a rare case should be supported though. It can easily by solved by setting return_on_insert
and return_on_update
to True and then setting default value or adding logic in database triggers.
comment:21 by , 9 years ago
Update
Moved the delegation logic from models closer to the code that generates the queries. This makes the implementation a bit more robust as most higher level APIs should just work without the need to update anything. It also allowed me to add support to querysets for delegation.
Field Options
delegated=True|False
A shortcut that automatically sets all of the following options to True when no value is provided for them.
delegated_on_insert=True|False
If True, the value of this field will not be skipped from the INSERT
statement completely. This automatically sets return_on_insert
to True
if not value is given. Supported on all backends.
delegated_on_update=True|False
If True, the value of this field will not be skipped from the UPDATE
statement completely. This automatically sets return_on_update
to True
if not value is given. Supported on all backends.
return_on_insert=True|False
If True, the value for field is loaded and assigned from the database along with the INSERT
statement. Supported on PostgreSQL using the RETURNING
clause and Oracle using the RETURNING INTO
clause.
return_on_update=True|False
If True, the value for field is loaded and assigned from the database along with the UPDATE
statement. Supported on PostgreSQL using the RETURNING
clause and Oracle using the RETURNING INTO
clause.
Model API Changes
save()
method now takes an optional argument called ignore_delegated
which can be a list of delegated field names. This negates the behavior of delegated_on_insert
and delegated_on_update
.
Queryset/Manager API Changes
Querysets have a new method call ignore_delegated
which takes names of delegated fields as arguments and negates the behavior of delegated_on_insert
and delegated_on_update
for any subsequent operations. For example, if we have a delegated field called created_at
on a model, we can temporarily disable delegation to DB for a single query.
class MyModel(models.Model): pytthon_ts = fields.DatetimeField() db_ts = fields.DatetimeField(delegated=True) # This will only update python_ts as `db_ts` is delegated to database MyModel.objects.update(db_ts=now(), python_ts=now()) # However, this will update both MyModel.objects.ignore_delegated('db_ts').update(db_ts=now(), python_on=now())
The above example also applies to Model.objects.create()
.
comment:22 by , 9 years ago
Needs documentation: | set |
---|---|
Owner: | changed from | to
comment:23 by , 9 years ago
Patch needs improvement: | unset |
---|
I've removed 'patch needs improvement' flag for now as we have a new patch and it needs review. Not sure if this is the right way to indicate that an issue needs a review.
comment:24 by , 8 years ago
Owner: | changed from | to
---|
With Ben Cole, we've been tackling this at the DutH sprints, and created a ticket #27138 but then we realized it's a duplicate of this one.
We'll do a PR based on what was described on the other ticket.
comment:25 by , 8 years ago
#27138 Does not seem related to this bug at all. Are you sure you added your comment to (and took ownership of) the correct ticket?
comment:26 by , 8 years ago
Oh, I meant #27446. I took ownership of the correct ticket I just referenced the bad one, sorry '
comment:27 by , 8 years ago
We posted a new patch for this at https://github.com/django/django/pull/7515
Happy reviewing :D
comment:28 by , 6 years ago
Here is a Postgres-specific solution that doesn't change Django, for anybody who has been patiently waiting..
from psycopg2.extensions import register_adapter, AsIs from django.db import models, connection class PostgresDefaultValueType: pass class DelegatedCharField(models.CharField): def get_prep_value(self, value): return value or PostgresDefaultValueType() register_adapter( PostgresDefaultValueType, lambda _: AsIs('DEFAULT')) class Widget(models.Model): name = models.CharField(max_length=255) dbnow = DelegatedCharField(max_length=255, null=True) with connection.cursor() as cursor: cursor.execute('alter table widget alter column dbnow add default now()') Widget.objects.create(name="foo")
With the above magical goodness, Django will send this to Postgres:
insert into widget (name, dbnow) values ('foo', DEFAULT)
instead of
insert into widget (name, dbnow) values ('foo', NULL)
And.... did you know that that will delegate that field to Postgres's default automatically?! I didn't, until yesterday!
comment:29 by , 4 years ago
Cc: | added |
---|
comment:30 by , 4 years ago
Cc: | added |
---|
comment:31 by , 3 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:32 by , 2 years ago
Cc: | added |
---|
Some initial comments can be found at https://groups.google.com/forum/#!topic/django-developers/rwP2DcPYHEs.
I agree with making
use_on_insert
anduse_on_update
instance variables instead, but my main concern is that this probably makes the variables as important as the rest of the aguments the field can take. However, I suppose most people should stay away from them. That is why I wanted to make them a bit more internal to the field class. But I guess this is actually a matter of how you document things. If the patch is merged, should there be documentation in the Field reference page?By the way, there is one thing I forgot to mention in my initial email, that since the value of a field is not created from within django, the object created or updated will not have the database-assigned value. For instance, if there is a
created
field that takes its value from the database, django will know that the value of the field isNone
and you will need to read the model instance again to fetch the correct values. But I am sure that people who need this would not mind. In any case, this is another reason why I think this should be more of an internal thing (at least as far as documentation is concerned).Regarding
save
, yes this is a way to achieve a similar effect now, but only for UPDATEs.