Code

Opened 3 years ago

Closed 22 months ago

Last modified 22 months ago

#15566 closed Bug (fixed)

.update queries should affect Date*Field's with auto_now

Reported by: jdunck Owned by: agiliq
Component: Documentation Version: 1.2
Severity: Normal Keywords:
Cc: streeter, charette.s@…, anssi.kaariainen@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Right now, calling Manager.update will update records without affecting date fields w/ auto_now specified.

This is a bug. Perhaps UpdateQuery should generate an update to any fields with auto_now.

Attachments (1)

ticket_15566_test.patch (2.4 KB) - added by bberes 3 years ago.

Download all attachments as: .zip

Change History (21)

Changed 3 years ago by bberes

comment:1 Changed 3 years ago by bberes

  • Has patch set
  • Owner changed from nobody to bberes
  • Patch needs improvement set
  • Status changed from new to assigned
  • Triage Stage changed from Unreviewed to Accepted

Yes, I think it's reasonable to expect auto_now=True to work with .update().
Just attached a basic regressiontest for now.

comment:2 follow-up: Changed 3 years ago by lrekucki

  • Triage Stage changed from Accepted to Design decision needed

It's a backwards incompatible change and there are a lot of people depending on it. Quoting the docs:

Be aware that the update() method is converted directly to an SQL statement. It is a bulk operation for direct updates. It doesn't run any save() methods on your models, or emit the pre_save or post_save signals (which are a consequence of calling save()).

It doesn't explicitly refer to the auto_now option, but that was always a hack, so you don't have to implement it on pre_save yourself. I would be in favour of leaving the current behaviour and add a documentation note to docs about how it works with auto_now.

comment:3 Changed 3 years ago by jdunck

Sure, a decision on whether to keep auto_now needs to be made in general, but either way, that's a 2.x decision. If they are kept, I think this is a good feature at that point.

comment:4 Changed 3 years ago by lukeplant

  • Severity set to Normal

comment:5 Changed 3 years ago by lukeplant

  • Type set to Bug

comment:6 Changed 3 years ago by aaugustin

  • Component changed from Database layer (models, ORM) to Documentation
  • Easy pickings set
  • Has patch unset
  • Patch needs improvement unset
  • Triage Stage changed from Design decision needed to Accepted
  • UI/UX unset

I think we should keep the current behavior and document it, as explained in comment 2.

comment:7 in reply to: ↑ 2 Changed 2 years ago by carljm

  • Component changed from Documentation to Database layer (models, ORM)

Replying to lrekucki:

It's a backwards incompatible change and there are a lot of people depending on it. Quoting the docs:

Be aware that the update() method is converted directly to an SQL statement. It is a bulk operation for direct updates. It doesn't run any save() methods on your models, or emit the pre_save or post_save signals (which are a consequence of calling save()).

It doesn't explicitly refer to the auto_now option, but that was always a hack, so you don't have to implement it on pre_save yourself. I would be in favour of leaving the current behaviour and add a documentation note to docs about how it works with auto_now.

This seems like a weak argument to me. As acknowledged, the docs say nothing about auto_now and qs.update(). If one reads all the relevant docs, I think one would reasonably assume that auto_now should work in case of a queryset update too, and I see no technical argument against it doing so. To argue that the feature should remain broken when it could be fixed because it was "always a hack" makes no sense to me. If the feature is a hack in a seriously bad sense, then it should either be removed, or fixed so that it is no longer a hack.

I think that this is a bug, and we should fix bugs, not document them, even if someone might be depending on the buggy behavior. I have a hard time imagining the case where someone puts auto_now on a field but is relying on that field *not* being updated in case of a bulk update, or how this fix would break that code.

comment:8 Changed 2 years ago by anonymous

I've pasted a basic implementation (no patch, since I did this the subqueries.py of my site-packages vs my local src copy, and because of the timezone stuff that I'm not sure how to handle).

https://gist.github.com/1872523#L83

I've tested this locally, in case it's OK as-is, and somebody wants to make a patch an add tests.

comment:9 Changed 2 years ago by streeter

  • Cc streeter added

comment:10 Changed 2 years ago by anonymous

Replying to carljm:

This seems like a weak argument to me.

I 100% agree with Carl here. I don't know where the contention regarding auto_now and auto_now_add came from, but it's completely unwarranted and ridiculous. It's a perfectly valid feature with a perfectly sane and non-hacky implementation. Everyone wants timestamps for updated_on and created_on at some point in their project, and this offers a way to do it on a base model without having to use signals or anything else (which I would deem to be overkill).

comment:11 Changed 2 years ago by charettes

  • Cc charette.s@… added

comment:12 Changed 2 years ago by lukeplant

I disagree with Carl on this one. I have always interpreted QuerySet.update() as a lower-level SQL construct, and would be pretty surprised if it did anything other than update the fields that I explicitly told it to.

I think the documentation is ambiguous - it simply doesn't state how QuerySet.update() and auto_now interact. I had interpreted it the other way to you - that update() does a SQL update of exactly the fields you specify, and nothing more. I think either interpretation is valid, and it is up to us to decide how to change the behaviour or clarify the docs.

I can think of ways the proposed change could break people's code. If I had an auto_now field that was used to essentially track the last modified timestamp of 'public' data in table, and then added some 'internal' state data to a table, I might make use of the current behaviour of QuerySet.update() to avoid triggering the auto_now behaviour (or anything else that might normally run on model save()). In fact I have coded some models where I have a split just like this. I can't remember if they had auto_now fields, but I wouldn't have wanted them to be updated when the internal fields were changed.

In addition, with the proposed change it's hard to see how you would execute a SQL "update-these-fields-and-only-these-fields-and-don't-do anything-special" command. I had relied on QuerySet.update() for that until now. It's not hard to add a field to the update call, it's harder to remove one.

comment:13 follow-up: Changed 2 years ago by akaariai

  • Cc anssi.kaariainen@… added

One more reason for wontfixing this: the pre/post save signals are not sent for .update() queries. This means that audit-logging .update() calls is impossible for example. So, what does this have to do with this issue? It makes the argument stronger that .update() is a lower level API.

I think this should be clearly documented: .update() updates _only_ the fields specified. It does not do _anything_ else. I haven't checked if this is already in the documentation, but it should be made absolutely clear that Django does not make sure you are able to log every change done to the models by signals, or by auto_now.

comment:14 in reply to: ↑ 13 Changed 2 years ago by carljm

  • Component changed from Database layer (models, ORM) to Documentation

I agree that this could be read either way. I don't think a Field argument is parallel to signals in terms of "API level"; it's closer to parallel to an overridden save. I just noticed that update doesn't even respect e.g. max_length (you just get a raw database error/warning, depending on your backend); if that's not considered a bug, then that does strengthen the argument that update is pretty low-level.

In any case, I don't feel strongly, and I'm swayed by Luke's argument that you can easily add your timestamp field to an update call if you want it, but you can't easily remove it if you want a way to bulk-update exactly the specified fields and no others. Changing component back to Documentation.

comment:15 follow-up: Changed 2 years ago by anonymous

If Queryset.update is a lower-level SQL construct, then why do we have cursor.execute which could achieve the same thing in almost the same amount of SLOC and still remain perfectly injection-proof?

Queryset.update is used widely in most of the apps I've worked on, and I'm pretty certain the developers who've added it didn't think of it as a low-level tool, but rather just an ORM method for updating multiple rows at the same time without having to select each and save each separately (ie, a simply efficiency tool in the ORM).

Allowing Queryset.update to cause database max_length errors and break the auto_now docs is a bug and serves no helpful purpose, because A) If I want to demonstrate a Postgres error, there is cursor.execute, and B) If a user wants to override the documented auto_now functionality during a Queryset.update, they can simply provide an override value, and C) If a user wants to update specific rows while not leaving any evidence behind, they can simply use cursor.execute to update only the rows they want to update. The use-case that Luke suggested is simply an exploit of the bug and is probably a rarity in most code-bases because since auto_now also implies editable=False, it sort-of also implies that auto_now is more of a back-end feature.

We don't have a documented Queryset.insert method for performing raw INSERT queries, or, for that matter, any documented Queryset methods which are data-altering, low-level SQL constructs. In fact, the only documented methods that are low-level seem to provide all the functionality necessary. Queryset.raw and cursor.execute provide the ability to perform low level SQL queries against the database to both retrieve and modify data.

I think that creating an inconsistency in Queryset.update with support predicated only on the potential for previous exploitation of an undocumented bug seems like a really bad idea.

To show I'm not just ranting, I'll be willing to write the necessary patch and corresponding tests with some guidance on how to handle both the aforementioned cases.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 2 years ago by russellm

Replying to anonymous:

If Queryset.update is a lower-level SQL construct, then why do we have cursor.execute which could achieve the same thing in almost the same amount of SLOC and still remain perfectly injection-proof?

Because .update() is database agnostic; raw SQL isn't necessarily cross-database compatible. I'd also argue that the syntax for something like a simple field value increment is cleaner in the ORM than in raw SQL, especially for a codebase that is principally written in ORM syntax.

Queryset.update is used widely in most of the apps I've worked on, and I'm pretty certain the developers who've added it didn't think of it as a low-level tool, but rather just an ORM method for updating multiple rows at the same time without having to select each and save each separately (ie, a simply efficiency tool in the ORM).

I can't speak to how the method is perceived in the wild -- that's a documentation issue -- but a quick inspection of the implementation of the parts involved points pretty quickly to the fact that it can't be interpreted as anything *but* a low level method.

auto_now_add is implemented as a pre-save, per-field modification. The fact that the value for this modification is easily evaluated, and the value for the field will be the same for all rows in the update is convenient, but doesn't solve the underlying issue -- that what we're dealing with is a field value that is computed on a per-row basis.

To point out the underlying problem - consider another type of field with a pre-save trigger; say, a CharField whose value is a computed checksum based on other fields and the timestamp. There's no way this could be expressed as a single update statement for multi-row updates.

The only way you could reconcile this with auto_now_add being fired on .update() is to consider the handling of auto_now_add to be a special case because it is a timestamp. I'm not convinced that special case is special enough to warrant breaking the rules.

To show I'm not just ranting, I'll be willing to write the necessary patch and corresponding tests with some guidance on how to handle both the aforementioned cases.

If you can provide a patch that works for pre-save updates *in the general case*, then I'm interested. Otherwise, what you're proposing is to make a special case of date and time fields, and that's not something I'm comfortable with.

I concur with Luke and Anssi; this is a documentation issue regarding the limitations and expectations of a call to .update().

comment:17 in reply to: ↑ 16 Changed 2 years ago by anonymous

To point out the underlying problem - consider another type of field with a pre-save trigger; say, a CharField whose value is a computed checksum based on other fields and the timestamp. There's no way this could be expressed as a single update statement for multi-row updates.

Oops, I see now, that points to a far worse inconsistency than what I perceived as a consequence of not patching Queryset.update. While I can think of at least one terribly Byzantinian way of performing at least general pre-save updates, I realize that it's impossible to perform something like a computed checksum or anything that would require a Python library on an update. Therefore, Queryset.update stands acquitted :( but I much appreciate the explanation, Russell.

comment:18 Changed 22 months ago by agiliq

  • Has patch set
  • Owner changed from bberes to agiliq
  • Status changed from assigned to new

comment:19 Changed 22 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from new to closed

In [dc01e41d2343386b481b983828fd861303877e81]:

Fixed #15566 - Documented that update() doesn't honor DateField.auto_now

Thanks Shabda Raaj for the draft patch.

comment:20 Changed 22 months ago by Tim Graham <timograham@…>

In [b0e2cb8e470acc552b4ed3fde3d5b1322b426bf2]:

[1.4.X] Fixed #15566 - Documented that update() doesn't honor DateField.auto_now

Thanks Shabda Raaj for the draft patch.

Backport of dc01e41d23 from master

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.