Opened 22 months ago

Last modified 19 months ago

#21454 assigned New feature

Ignoring certain fields on INSERT and UPDATE queries

Reported by: mpessas Owned by: mpessas
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: oracle
Cc: mmanfre@…, shai Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

There are cases that you would like to let the database handle certain fields. For instance,

  • Set fields to their default value at the database-level (such as a created timestamp).
  • Have the value of fields computed always by the database (e.g. with a trigger).
  • Have virtual fields in the database etc.

The first case is especially important for timestamps, since setting the default to NOW() will make sure (at least in postgres) that all rows inserted under the same transaction will have the same timestamp (compared to how auto_now_add works). This means that, when you insert a lot of objects in one transaction, they will all have the exact same timestamp. Moreover, relying on the database for setting default values to fields reduces the amount of data that need to be sent to
the database server.

The second and third case allow to move some of the logic to the database and take advantage of some features they offer as well as speed up some things.

In order to achieve this (without resorting to hacks, like ignoring the values that django sends in the database), django should ignore these fields, when constructing an INSERT or UPDATE query. This could be done by allowing a field to tell django to ignore it on INSERT or UPDATE queries, the same way it does now for AutoFields.

You can find an implementation of this approach at https://github.com/mpessas/django/commit/a663d8f95d9cca5112ccd33e3a303cae82908b60 and a demo project at https://github.com/mpessas/dbfield (timestamp app).

Change History (17)

comment:1 Changed 22 months ago by mpessas

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to mpessas
  • Patch needs improvement unset
  • Status changed from new to assigned

Some initial comments can be found at https://groups.google.com/forum/#!topic/django-developers/rwP2DcPYHEs.

I agree with making use_on_insert and use_on_update instance variables instead, but my main concern is that this probably makes the variables as important as the rest of the aguments the field can take. However, I suppose most people should stay away from them. That is why I wanted to make them a bit more internal to the field class. But I guess this is actually a matter of how you document things. If the patch is merged, should there be documentation in the Field reference page?

By the way, there is one thing I forgot to mention in my initial email, that since the value of a field is not created from within django, the object created or updated will not have the database-assigned value. For instance, if there is a created field that takes its value from the database, django will know that the value of the field is None and you will need to read the model instance again to fetch the correct values. But I am sure that people who need this would not mind. In any case, this is another reason why I think this should be more of an internal thing (at least as far as documentation is concerned).

Regarding save, yes this is a way to achieve a similar effect now, but only for UPDATEs.

comment:2 follow-up: Changed 22 months ago by akaariai

  • Triage Stage changed from Unreviewed to Accepted

It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.

Validation code is needed to check that primary keys aren't use_on_insert/use_on_update.

Should there be a way to override the database assignment? How about .update(modified_field=somevalue)? The simplest solution seems to be to not allow override in .save(), but allow .update() of db default fields

Fixtures loading should probably force-insert/update the fixture's value, not use database default. This is doable by checking that if raw=True in save, then insert everything as-is and don't do auto-update.

In addition, you will need also ModelForms support.

In general I see this feature as useful when you need it, so marking as accepted.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 22 months ago by mpessas

Replying to akaariai:

It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.

I would rather avoid the extra query to re-read the db-default values; it would have a significant impact on performance, when one of the goals was to improve performance in the first place.

Regarding the RETURNING clause, it is not part of the SQL standard AFAIK (see http://www.postgresql.org/docs/9.1/static/sql-update.html#AEN76402) and that's the reason I avoided it. That said, can we have backend-specific codepaths in Django? That is, use the RETURNING clause, when available, and fallback to a re-read query otherwise.

In any case, I suppose we should allow the developer to skip the re-read query, if using the RETURNING clause is not possible. How does that sound?

Validation code is needed to check that primary keys aren't use_on_insert/use_on_update.

Done at https://github.com/mpessas/django/commit/eda1b860311496d37bd6609b710f69fa4ae6fc41.

Should there be a way to override the database assignment? How about .update(modified_field=somevalue)? The simplest solution seems to be to not allow override in .save(), but allow .update() of db default fields

Yes, that was my idea as well. The .update() method works as expected (see https://github.com/mpessas/django/commit/4fcfed999e82b08182b6c614553e7d707fdf4a6d).

Fixtures loading should probably force-insert/update the fixture's value, not use database default. This is doable by checking that if raw=True in save, then insert everything as-is and don't do auto-update.

Fixed at https://github.com/mpessas/django/commit/7540af0471346d62e60f1c61c27638b3233d4d5b. However, I am not familiar with the process and as a result confident with the commit; are we sure that the raw flag is used only for fixture loading? Should I research a bit more?

In addition, you will need also ModelForms support.

I suppose you mean that the fields should be excluded by default in ModelForms.

I can think of three ways to do it:

  • Do it in fields_for_model, that is, exclude all use_on_insert/use_on_update fields, unless they are explicitly included in the fields variable of the form. The big issue here is that you cannot tell, whether the form will be used to create or update a row in order to handle use_on_insert and use_on_update fields differently; you have to exclude both. See commit https://github.com/mpessas/django/commit/3d298a6f269e28aa77e290aa4299cb17e3a6185b.
  • Do it in the form's initializer. In this case, you do have access to the instance variable; thus, you can tell whether the form is going to create or update a row. But this would not be consistent; the developer would get a different set of fields in the two cases. Moreover, the field selection process is part of the metaclass; as a result, the code to exclude the db-default fields belongs to the metaclass as well.
  • Do it in the __call__ method of the metaclass. This keeps all field-selection code in the metaclass, but still has the problem of consistency.

I am not sure which choice is best. The first one seems simpler to me (from the point of view of the user of a ModelForm), since it avoids the inconsistency issue (which would also leak to templates etc). Any opinions here?

In general I see this feature as useful when you need it, so marking as accepted.

Thanks!

comment:4 in reply to: ↑ 3 ; follow-up: Changed 21 months ago by mpessas

Replying to mpessas:

Replying to akaariai:

It would be really useful to do autorefresh after save. Access to database set values could also be done with RETURNING, but for first implementation just executing a query from .save() would be enough.

I would rather avoid the extra query to re-read the db-default values; it would have a significant impact on performance, when one of the goals was to improve performance in the first place.

Regarding the RETURNING clause, it is not part of the SQL standard AFAIK (see http://www.postgresql.org/docs/9.1/static/sql-update.html#AEN76402) and that's the reason I avoided it. That said, can we have backend-specific codepaths in Django? That is, use the RETURNING clause, when available, and fallback to a re-read query otherwise.

In any case, I suppose we should allow the developer to skip the re-read query, if using the RETURNING clause is not possible. How does that sound?

Check commit https://github.com/mpessas/django/commit/dac93937c66040cb901e5e48bd6deba5ce51de9e. If the backend supports the RETURNING clause, we use that to fetch the values the database has given to the db-default fields. Otherwise, an extra SELECT query is issued. If the user does not want this overhead, he can set force_fetch=False (which is the default).

There are two main issues here: I have not checked yet how this works, when Model inheritance is used and the Oracle backend is broken.

Regarding the latter, I have not found any helpful docs about the RETURNING clause and I cannot test it either. Is there someone I could ask or should I contact the cx_oracle developers directly?

comment:5 in reply to: ↑ 4 ; follow-up: Changed 21 months ago by manfre

Replying to mpessas:

Check commit https://github.com/mpessas/django/commit/dac93937c66040cb901e5e48bd6deba5ce51de9e. If the backend supports the RETURNING clause, we use that to fetch the values the database has given to the db-default fields. Otherwise, an extra SELECT query is issued. If the user does not want this overhead, he can set force_fetch=False (which is the default).

There are two main issues here: I have not checked yet how this works, when Model inheritance is used and the Oracle backend is broken.

Regarding the latter, I have not found any helpful docs about the RETURNING clause and I cannot test it either. Is there someone I could ask or should I contact the cx_oracle developers directly?

.

It's possible that a database backend is capable of returning the IDs from an insert without supporting the RETURNING clause. Those should be separate features. I previously injected a select statement to return the inserted ID without needing a second round trip to the MSSQL server. It's also possible that certain database drivers may return newly inserted IDs without SQL.

MSSQL supports the OUTPUT clause, which is similar to RETURNING, but not exactly the same. Once I add Django 1.7 support to django-mssql, I'll test out your branch. The primary database I develop django-mssql against is a heavy user of readonly fields (computed fields, populated with triggers/SSIS, etc). We've made due with a Manager that lets me define the readonly fields and having to manually refresh the instance after certain inserts.

comment:6 Changed 21 months ago by manfre

This ticket seems to only address fields that are ALWAYS updated by a trigger, but not fields that MAY be updated by a trigger (or something else working in/against the database), but doesn't necessarily require blocking changes on either insert or update.

This scenario seems like it should be covered by this ticket too. If so, another field argument always_refresh could handle that. When always_refresh=True, it would instruct the ORM to fetch the field after an insert or update with either use of a RETURNING clause or a separate fetch. When always_refresh=False, it would have no action, unless use_on_insert=False or use_on_update=False.

A third field argument feels excessive, especially since you'd need to define all three for the common case of a read-only field. The use_on_insert and use_on_update arguments could get merged in to an edit_behavior argument (probably needs a better name) that expects one of the following constant values:

DEFAULT - use_on_insert == True and use_on_update == True
CREATE_ONLY - use_on_insert == True and use_on_update == False
UPDATE_ONLY - use_on_insert == False and use_on_update == True
READ_ONLY - use_on_insert == False and use_on_update == False

comment:7 Changed 21 months ago by manfre

  • Cc mmanfre@… added

comment:8 follow-up: Changed 21 months ago by shai

  • Cc shai added
  • Keywords oracle added

@mpessas: With all things Oracle, you can always bug me. I only had time to glimpse at your patch now, but I will happily work with you to make sure that, by the time it is committed, Oracle is not broken.

comment:9 in reply to: ↑ 5 Changed 20 months ago by mpessas

Replying to manfre:

It's possible that a database backend is capable of returning the IDs from an insert without supporting the RETURNING clause. Those should be separate features. I previously injected a select statement to return the inserted ID without needing a second round trip to the MSSQL server. It's also possible that certain database drivers may return newly inserted IDs without SQL.

True, this has been implemented at https://github.com/mpessas/django/commit/c7ca041347b9c8b24285cb0c94ba3eefeded707e.

Replying to manfre:

The use_on_insert and use_on_update arguments could get merged in to an edit_behavior argument (probably needs a better name) that expects one of the following constant values:

DEFAULT - use_on_insert == True and use_on_update == True
CREATE_ONLY - use_on_insert == True and use_on_update == False
UPDATE_ONLY - use_on_insert == False and use_on_update == True
READ_ONLY - use_on_insert == False and use_on_update == False

That is a good idea, see commit https://github.com/mpessas/django/commit/a8a19b65774383bc47d3fb04ca6966ba4a4cba94 (a better name is welcome).

The always_refresh feature sounds interesting and I will take a look; it should not be that hard.

Keep in mind, by the way, that as far as I can tell and at least in PostgreSQL the RETURNING clause does not "work" with an after trigger; that is, if a field is changed by an AFTER INSERT or AFTER UPDATE trigger will not be returned with a RETURNING clause.

comment:10 in reply to: ↑ 8 Changed 20 months ago by mpessas

Replying to shai:

@mpessas: With all things Oracle, you can always bug me. I only had time to glimpse at your patch now, but I will happily work with you to make sure that, by the time it is committed, Oracle is not broken.

Great, thanks! Right now the Oracle backend uses a second query to read any updated values, even though it supports the RETURNING INTO clause. However, I am not that familiar with Oracle to be sure how to make this work (or test anything).

As far as I can tell, Oracle needs to create "bind" variables with a given type. Would it work, if I used a list of Fields instead of names (right now the list of returning values is a list of field names)? That way, I guess I should be (somehow, maybe via the db_type property) able to figure out the type of each field in the returning values list. How does this sound? I could work on this, but like I said I have no way to actually test any changes.

By the way, you can email me directly, if we need to coordinate this; it is a gmail account with this username.

comment:11 Changed 20 months ago by shai

I commented on the PR about schema editing and nulls-vs-empty-string, both in tests. I also started working on the fixes needed for the Oracle backend. It's a little more work than I thought, including a need to define, for each of the built-in field types, the appropriate cx_Oracle data type -- which we were able to avoid so far.

In turn, this means that, for proper testing, we'll probably need to add tests that return each type of field.

comment:12 Changed 20 months ago by mpessas

The issues have been fixed so far. If I can help with writing the tests, let me know.

Just for reference, the two issues I have not looked so far are how the changes work with model inheritance (probably there are no more changes needed) and the always_refresh option described above.

comment:13 Changed 20 months ago by mpessas

I meant, the issues mentioned in the pull request. Sorry.

comment:14 Changed 20 months ago by mpessas

I just took a look at model inheritance and _save_parents calls _save_table as well, so everything should work in this use-case.

comment:15 Changed 20 months ago by shai

#19527, dealing with returned ids in bulk_create() is related, at least as far as Oracle is concerned.

comment:16 Changed 20 months ago by shai

I made some comments on the PR (I don't think it is properly linked here -- https://github.com/django/django/pull/2149 -- now it is). Additionally, I added a commit on a branch based on yours that makes Oracle pass all the model_fields and model_forms tests, at https://github.com/shaib/django/tree/auto_fields (sorry, I don't have time right now to run the full suite. Maybe later today).

W.r.t to the tests failing because Oracle returns null strings as empty, I just changed the relevant assertIsNone calls to assertFalse. Hope that is ok too.

comment:17 Changed 19 months ago by timo

  • Patch needs improvement set
Note: See TracTickets for help on using tickets.
Back to Top