Opened 3 years ago

Last modified 8 months ago

#21454 assigned New feature

Ignoring certain fields on INSERT and UPDATE queries

Reported by: mpessas Owned by: owais
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: oracle
Cc: mmanfre@…, shai, me@…, loneowais@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There are cases that you would like to let the database handle certain fields. For instance,

  • Set fields to their default value at the database-level (such as a created timestamp).
  • Have the value of fields computed always by the database (e.g. with a trigger).
  • Have virtual fields in the database etc.

The first case is especially important for timestamps, since setting the default to NOW() will make sure (at least in postgres) that all rows inserted under the same transaction will have the same timestamp (compared to how auto_now_add works). This means that, when you insert a lot of objects in one transaction, they will all have the exact same timestamp. Moreover, relying on the database for setting default values to fields reduces the amount of data that need to be sent to
the database server.

The second and third case allow to move some of the logic to the database and take advantage of some features they offer as well as speed up some things.

In order to achieve this (without resorting to hacks, like ignoring the values that django sends in the database), django should ignore these fields, when constructing an INSERT or UPDATE query. This could be done by allowing a field to tell django to ignore it on INSERT or UPDATE queries, the same way it does now for AutoFields.

You can find an implementation of this approach at https://github.com/mpessas/django/commit/a663d8f95d9cca5112ccd33e3a303cae82908b60 and a demo project at https://github.com/mpessas/dbfield (timestamp app).

Change History (23)

comment:1 Changed 3 years ago by mpessas

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to mpessas
  • Patch needs improvement unset
  • Status changed from new to assigned

Some initial comments can be found at https://groups.google.com/forum/#!topic/django-developers/rwP2DcPYHEs.

I agree with making use_on_insert and use_on_update instance variables instead, but my main concern is that this probably makes the variables as important as the rest of the aguments the field can take. However, I suppose most people should stay away from them. That is why I wanted to make them a bit more internal to the field class. But I guess this is actually a matter of how you document things. If the patch is merged, should there be documentation in the Field reference page?

By the way, there is one thing I forgot to mention in my initial email, that since the value of a field is not created from within django, the object created or updated will not have the database-assigned value. For instance, if there is a created field that takes its value from the database, django will know that the value of the field is None and you will need to read the model instance again to fetch the correct values. But I am sure that people who need this would not mind. In any case, this is another reason why I think this should be more of an internal thing (at least as far as documentation is concerned).

Regarding save, yes this is a way to achieve a similar effect now, but only for UPDATEs.

comment:2 follow-up: Changed 3 years ago by akaariai

  • Triage Stage changed from Unreviewed to Accepted

It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.

Validation code is needed to check that primary keys aren't use_on_insert/use_on_update.

Should there be a way to override the database assignment? How about .update(modified_field=somevalue)? The simplest solution seems to be to not allow override in .save(), but allow .update() of db default fields

Fixtures loading should probably force-insert/update the fixture's value, not use database default. This is doable by checking that if raw=True in save, then insert everything as-is and don't do auto-update.

In addition, you will need also ModelForms support.

In general I see this feature as useful when you need it, so marking as accepted.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by mpessas

Replying to akaariai:

It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.

I would rather avoid the extra query to re-read the db-default values; it would have a significant impact on performance, when one of the goals was to improve performance in the first place.

Regarding the RETURNING clause, it is not part of the SQL standard AFAIK (see http://www.postgresql.org/docs/9.1/static/sql-update.html#AEN76402) and that's the reason I avoided it. That said, can we have backend-specific codepaths in Django? That is, use the RETURNING clause, when available, and fallback to a re-read query otherwise.

In any case, I suppose we should allow the developer to skip the re-read query, if using the RETURNING clause is not possible. How does that sound?

Validation code is needed to check that primary keys aren't use_on_insert/use_on_update.

Done at https://github.com/mpessas/django/commit/eda1b860311496d37bd6609b710f69fa4ae6fc41.

Should there be a way to override the database assignment? How about .update(modified_field=somevalue)? The simplest solution seems to be to not allow override in .save(), but allow .update() of db default fields

Yes, that was my idea as well. The .update() method works as expected (see https://github.com/mpessas/django/commit/4fcfed999e82b08182b6c614553e7d707fdf4a6d).

Fixtures loading should probably force-insert/update the fixture's value, not use database default. This is doable by checking that if raw=True in save, then insert everything as-is and don't do auto-update.

Fixed at https://github.com/mpessas/django/commit/7540af0471346d62e60f1c61c27638b3233d4d5b. However, I am not familiar with the process and as a result confident with the commit; are we sure that the raw flag is used only for fixture loading? Should I research a bit more?

In addition, you will need also ModelForms support.

I suppose you mean that the fields should be excluded by default in ModelForms.

I can think of three ways to do it:

  • Do it in fields_for_model, that is, exclude all use_on_insert/use_on_update fields, unless they are explicitly included in the fields variable of the form. The big issue here is that you cannot tell, whether the form will be used to create or update a row in order to handle use_on_insert and use_on_update fields differently; you have to exclude both. See commit https://github.com/mpessas/django/commit/3d298a6f269e28aa77e290aa4299cb17e3a6185b.
  • Do it in the form's initializer. In this case, you do have access to the instance variable; thus, you can tell whether the form is going to create or update a row. But this would not be consistent; the developer would get a different set of fields in the two cases. Moreover, the field selection process is part of the metaclass; as a result, the code to exclude the db-default fields belongs to the metaclass as well.
  • Do it in the __call__ method of the metaclass. This keeps all field-selection code in the metaclass, but still has the problem of consistency.

I am not sure which choice is best. The first one seems simpler to me (from the point of view of the user of a ModelForm), since it avoids the inconsistency issue (which would also leak to templates etc). Any opinions here?

In general I see this feature as useful when you need it, so marking as accepted.

Thanks!

comment:4 in reply to: ↑ 3 ; follow-up: Changed 3 years ago by mpessas

Replying to mpessas:

Replying to akaariai:

It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.

I would rather avoid the extra query to re-read the db-default values; it would have a significant impact on performance, when one of the goals was to improve performance in the first place.

Regarding the RETURNING clause, it is not part of the SQL standard AFAIK (see http://www.postgresql.org/docs/9.1/static/sql-update.html#AEN76402) and that's the reason I avoided it. That said, can we have backend-specific codepaths in Django? That is, use the RETURNING clause, when available, and fallback to a re-read query otherwise.

In any case, I suppose we should allow the developer to skip the re-read query, if using the RETURNING clause is not possible. How does that sound?

Check commit https://github.com/mpessas/django/commit/dac93937c66040cb901e5e48bd6deba5ce51de9e. If the backend supports the RETURNING clause, we use that to fetch the values the database has given to the db-default fields. Otherwise, an extra SELECT query is issued. If the user does not want this overhead, he can set force_fetch=False (which is the default).

There are two main issues here: I have not checked yet how this works, when Model inheritance is used and the Oracle backend is broken.

Regarding the latter, I have not found any helpful docs about the RETURNING clause and I cannot test it either. Is there someone I could ask or should I contact the cx_oracle developers directly?

comment:5 in reply to: ↑ 4 ; follow-up: Changed 3 years ago by manfre

Replying to mpessas:

Check commit https://github.com/mpessas/django/commit/dac93937c66040cb901e5e48bd6deba5ce51de9e. If the backend supports the RETURNING clause, we use that to fetch the values the database has given to the db-default fields. Otherwise, an extra SELECT query is issued. If the user does not want this overhead, he can set force_fetch=False (which is the default).

There are two main issues here: I have not checked yet how this works, when Model inheritance is used and the Oracle backend is broken.

Regarding the latter, I have not found any helpful docs about the RETURNING clause and I cannot test it either. Is there someone I could ask or should I contact the cx_oracle developers directly?

.

It's possible that a database backend is capable of returning the IDs from an insert without supporting the RETURNING clause. Those should be separate features. I previously injected a select statement to return the inserted ID without needing a second round trip to the MSSQL server. It's also possible that certain database drivers may return newly inserted IDs without SQL.

MSSQL supports the OUTPUT clause, which is similar to RETURNING, but not exactly the same. Once I add Django 1.7 support to django-mssql, I'll test out your branch. The primary database I develop django-mssql against is a heavy user of readonly fields (computed fields, populated with triggers/SSIS, etc). We've made due with a Manager that lets me define the readonly fields and having to manually refresh the instance after certain inserts.

comment:6 Changed 3 years ago by manfre

This ticket seems to only address fields that are ALWAYS updated by a trigger, but not fields that MAY be updated by a trigger (or something else working in/against the database), but doesn't necessarily require blocking changes on either insert or update.

This scenario seems like it should be covered by this ticket too. If so, another field argument always_refresh could handle that. When always_refresh=True, it would instruct the ORM to fetch the field after an insert or update with either use of a RETURNING clause or a separate fetch. When always_refresh=False, it would have no action, unless use_on_insert=False or use_on_update=False.

A third field argument feels excessive, especially since you'd need to define all three for the common case of a read-only field. The use_on_insert and use_on_update arguments could get merged in to an edit_behavior argument (probably needs a better name) that expects one of the following constant values:

DEFAULT - use_on_insert == True and use_on_update == True
CREATE_ONLY - use_on_insert == True and use_on_update == False
UPDATE_ONLY - use_on_insert == False and use_on_update == True
READ_ONLY - use_on_insert == False and use_on_update == False

comment:7 Changed 3 years ago by manfre

  • Cc mmanfre@… added

comment:8 follow-up: Changed 3 years ago by shai

  • Cc shai added
  • Keywords oracle added

@mpessas: With all things Oracle, you can always bug me. I only had time to glimpse at your patch now, but I will happily work with you to make sure that, by the time it is committed, Oracle is not broken.

comment:9 in reply to: ↑ 5 Changed 3 years ago by mpessas

Replying to manfre:

It's possible that a database backend is capable of returning the IDs from an insert without supporting the RETURNING clause. Those should be separate features. I previously injected a select statement to return the inserted ID without needing a second round trip to the MSSQL server. It's also possible that certain database drivers may return newly inserted IDs without SQL.

True, this has been implemented at https://github.com/mpessas/django/commit/c7ca041347b9c8b24285cb0c94ba3eefeded707e.

Replying to manfre:

The use_on_insert and use_on_update arguments could get merged in to an edit_behavior argument (probably needs a better name) that expects one of the following constant values:

DEFAULT - use_on_insert == True and use_on_update == True
CREATE_ONLY - use_on_insert == True and use_on_update == False
UPDATE_ONLY - use_on_insert == False and use_on_update == True
READ_ONLY - use_on_insert == False and use_on_update == False

That is a good idea, see commit https://github.com/mpessas/django/commit/a8a19b65774383bc47d3fb04ca6966ba4a4cba94 (a better name is welcome).

The always_refresh feature sounds interesting and I will take a look; it should not be that hard.

Keep in mind, by the way, that as far as I can tell and at least in PostgreSQL the RETURNING clause does not "work" with an after trigger; that is, if a field is changed by an AFTER INSERT or AFTER UPDATE trigger will not be returned with a RETURNING clause.

comment:10 in reply to: ↑ 8 Changed 3 years ago by mpessas

Replying to shai:

@mpessas: With all things Oracle, you can always bug me. I only had time to glimpse at your patch now, but I will happily work with you to make sure that, by the time it is committed, Oracle is not broken.

Great, thanks! Right now the Oracle backend uses a second query to read any updated values, even though it supports the RETURNING INTO clause. However, I am not that familiar with Oracle to be sure how to make this work (or test anything).

As far as I can tell, Oracle needs to create "bind" variables with a given type. Would it work, if I used a list of Fields instead of names (right now the list of returning values is a list of field names)? That way, I guess I should be (somehow, maybe via the db_type property) able to figure out the type of each field in the returning values list. How does this sound? I could work on this, but like I said I have no way to actually test any changes.

By the way, you can email me directly, if we need to coordinate this; it is a gmail account with this username.

comment:11 Changed 3 years ago by shai

I commented on the PR about schema editing and nulls-vs-empty-string, both in tests. I also started working on the fixes needed for the Oracle backend. It's a little more work than I thought, including a need to define, for each of the built-in field types, the appropriate cx_Oracle data type -- which we were able to avoid so far.

In turn, this means that, for proper testing, we'll probably need to add tests that return each type of field.

comment:12 Changed 3 years ago by mpessas

The issues have been fixed so far. If I can help with writing the tests, let me know.

Just for reference, the two issues I have not looked so far are how the changes work with model inheritance (probably there are no more changes needed) and the always_refresh option described above.

comment:13 Changed 3 years ago by mpessas

I meant, the issues mentioned in the pull request. Sorry.

comment:14 Changed 3 years ago by mpessas

I just took a look at model inheritance and _save_parents calls _save_table as well, so everything should work in this use-case.

comment:15 Changed 3 years ago by shai

#19527, dealing with returned ids in bulk_create() is related, at least as far as Oracle is concerned.

comment:16 Changed 3 years ago by shai

I made some comments on the PR (I don't think it is properly linked here -- https://github.com/django/django/pull/2149 -- now it is). Additionally, I added a commit on a branch based on yours that makes Oracle pass all the model_fields and model_forms tests, at https://github.com/shaib/django/tree/auto_fields (sorry, I don't have time right now to run the full suite. Maybe later today).

W.r.t to the tests failing because Oracle returns null strings as empty, I just changed the relevant assertIsNone calls to assertFalse. Hope that is ok too.

comment:17 Changed 3 years ago by timo

  • Patch needs improvement set

comment:18 Changed 12 months ago by adamchainz

  • Cc me@… added

comment:19 Changed 8 months ago by owais

  • Cc loneowais@… added

comment:20 Changed 8 months ago by owais

I am continuing work on this as part of https://github.com/django/django/pull/5904. My intention is to only support backends that actually support this feature i.e, Postgresql using RETURNING and Oracle using RETURNING INTO. I can follow up with another PR to "simulate" the behavior on backends that don't support it. I've yet to write any docs or test cases.

The changes introduce 3 new field options.

return_on_insert

When set to True, django will load the value from the database column along with the insert statement and set it on the model. This is a noop on mysql and sqlite.

return_on_update

When set to True, django will load the value from the database column along with the update statement and set it on the model. This is a noop on mysql and sqlite.

delegated

When set to True, django completely skips this field from insert and update operations and let's the database set a value. It also sets return_on_update and return_on_insert to True if no explicit value is passed for them.

I think the combination of these three options covers almost every option except when all the fields are delegated. In this case, the update statement is never executed as UPDATE statement does not work without SET clause and we can't intelligently decide whether to use the existing value in update or not.

One possible way to reduce the impact of edge case would be to add another field called delegate_if_None which would delegate the field to database only when current value of the field is set to None. I'm not sure if such a rare case should be supported though. It can easily by solved by setting return_on_insert and return_on_update to True and then setting default value or adding logic in database triggers.

comment:21 Changed 8 months ago by owais

Update

Moved the delegation logic from models closer to the code that generates the queries. This makes the implementation a bit more robust as most higher level APIs should just work without the need to update anything. It also allowed me to add support to querysets for delegation.

Field Options

delegated=True|False

A shortcut that automatically sets all of the following options to True when no value is provided for them.

delegated_on_insert=True|False

If True, the value of this field will not be skipped from the INSERT statement completely. This automatically sets return_on_insert to True if not value is given. Supported on all backends.

delegated_on_update=True|False

If True, the value of this field will not be skipped from the UPDATE statement completely. This automatically sets return_on_update to True if not value is given. Supported on all backends.

return_on_insert=True|False

If True, the value for field is loaded and assigned from the database along with the INSERT statement. Supported on PostgreSQL using the RETURNING clause and Oracle using the RETURNING INTO clause.

return_on_update=True|False

If True, the value for field is loaded and assigned from the database along with the UPDATE statement. Supported on PostgreSQL using the RETURNING clause and Oracle using the RETURNING INTO clause.

Model API Changes

save() method now takes an optional argument called ignore_delegated which can be a list of delegated field names. This negates the behavior of delegated_on_insert and delegated_on_update.

Queryset/Manager API Changes

Querysets have a new method call ignore_delegated which takes names of delegated fields as arguments and negates the behavior of delegated_on_insert and delegated_on_update for any subsequent operations. For example, if we have a delegated field called created_at on a model, we can temporarily disable delegation to DB for a single query.

class MyModel(models.Model):
     pytthon_ts = fields.DatetimeField()
     db_ts = fields.DatetimeField(delegated=True)


# This will only update python_ts as `db_ts` is delegated to database
MyModel.objects.update(db_ts=now(), python_ts=now())


# However, this will update both
MyModel.objects.ignore_delegated('db_ts').update(db_ts=now(), python_on=now())

The above example also applies to Model.objects.create().

Last edited 8 months ago by owais (previous) (diff)

comment:22 Changed 8 months ago by owais

  • Needs documentation set
  • Owner changed from mpessas to owais

comment:23 Changed 8 months ago by owais

  • Patch needs improvement unset

I've removed 'patch needs improvement' flag for now as we have a new patch and it needs review. Not sure if this is the right way to indicate that an issue needs a review.

Last edited 8 months ago by owais (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top