#18556 closed Cleanup/optimization (fixed)
Improve query efficiency of .add() on a reverse foreign key
Reported by: | Alex Gaynor | Owned by: | Tim Graham |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
It does one query per object received, instead of just one for the entire batch. Attached is a patch which fixes this. It's technically not backwards compatible because signals are no longer sent, however one could artificially send them. Also, in the event of an error, none of the objects will be modified, whereas currently some of them will be.
Attachments (2)
Change History (16)
by , 12 years ago
Attachment: | t18556-remove-efficiency.diff added |
---|
comment:1 by , 12 years ago
Has patch: | set |
---|
by , 12 years ago
Attachment: | 18556.diff added |
---|
comment:2 by , 12 years ago
I think it makes sense not to send any signals for consistency with how clear()
works, plus it's going to be backwards-incompatible to some extent with save()
no longer being called. I've added documentation, including a note to the backwards-incompatible changes for 1.6.
comment:3 by , 12 years ago
The closest parallel to remove()
in API terms is not clear()
, it's add()
. There is no consistency gained by having add()
continue to send signals and remove()
not. If we're going to do this, the same approach should be used for both add()
and remove()
(there's also no reason for having an efficiency difference between them).
There's also another backwards incompatibility here; the passed-in object instances themselves are no longer updated. This looks trivial to fix; just restore the setattr
line in the loop.
comment:4 by , 12 years ago
Thanks Carl, good point. I'll be happy to update add()
as well. Are you +1 on removing legal signals for these operations or do you think it needs a discussion on django-developers?
comment:5 by , 12 years ago
Yeah, I didn't comment on that in my first comment because I'm not sure :-) I think it will in all likelihood break real code if we stop sending these signals here. On the other hand, I also agree with Alex and Anssi that the current implementation is dumb, update()
makes way more sense, and that it would be better to have an update signal.
I wouldn't say I'm a +1, but I'm not -1 either. Somewhere in the zero range, I guess :-) I'm not gonna stand in the way.
comment:6 by , 12 years ago
There is also the possibility to have fast-path of .update() for no listeners case, and loop when there are listeners. The complexity of dual paths isn't big at all, and it is worth some complexity when one can save possibly large amounts of DB resources.
The code as written doesn't do just one query per obj, it does two. .save() will select, then update for each object. This is typical example case where either using update_fields, or applying #16649 would help.
comment:7 by , 12 years ago
For add()
, it looks like we need to at least keep the possibility of executing save()
for an object that hasn't been saved yet. See this test https://github.com/django/django/blob/master/tests/modeltests/many_to_one/tests.py#L49-L65
Here's what I came up with so far. Tests pass except for multiple_database. I'm guessing self.model.objects.filter(pk__in=ids).update(**{rel_field.name: self.instance})
needs a slightly different implementation to work with multi-db, but I'm not sure what that would be.
https://github.com/timgraham/django/commit/d6b2720138e53cde0e8f60471e8d92f444e87474
comment:8 by , 12 years ago
The multidb issue is likely due to having the object saved (thus PK set), but into different DB.
Just checking the pk is set isn't enough, consulting the model._state.db + _state.adding would likely yield the correct result. If different than the current DB or adding, then save, else go to update() directly.
This seems to also raise a possible race condition. Assume thread T1 fetches the object from DB, then T2 deletes it, and then T1 issues add(). The result was that the object was resaved to DB, after patch it is that it remains deleted. In the add() case the correct behavior is resave so that after add you can trust that relation actually contains all the objects in the DB.
The race could most reliably be resolved by doing an UPDATE ... RETURNING PK. Check which PKs were updated, those that weren't must be resaved. Unfortunately RETURNING isn't available in MySQL or SQLite, so this idea would only apply to those DBs supporting returning. Others could of course do update(); values_list('pk'); save those not existing in the values_list separately.
Completely another matter is the race condition actually matters in real world situations...
comment:9 by , 11 years ago
Patch needs improvement: | set |
---|
comment:10 by , 10 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
Status: | new → assigned |
Summary: | .remove() on a reverse foreign key executes too many queries → Improve query efficiency of .add() on a reverse foreign key |
Version: | 1.4 → master |
It seems the issue with remove()
has since been dealt with by adding a bulk parameter. I updated my original patch and fixed the multidb problems as suggested by Anssi.
comment:11 by , 10 years ago
Patch needs improvement: | set |
---|
Loic would like the same bulk
parameter for add()
. Also GenericForeignKey
needs to be updated with the same behavior.
comment:12 by , 10 years ago
Patch needs improvement: | unset |
---|
+1 for this change. There are already situations where the signals framework doesn't catch all changes. Using .update() in this situation seems natural. On the other hand, I do not actually use signals...
If we wanted to keep signals for this, then we should have some sort of pre/post update signal which one could use here. The signal would likely have an argument of "objs" which would be lazily fetched from the DB - if nobody access the objs then no extra work done.
The patch itself looks good to me.