Opened 12 years ago

Closed 9 years ago

Last modified 9 years ago

#18556 closed Cleanup/optimization (fixed)

Improve query efficiency of .add() on a reverse foreign key

Reported by: Alex Gaynor Owned by: Tim Graham
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

It does one query per object received, instead of just one for the entire batch. Attached is a patch which fixes this. It's technically not backwards compatible because signals are no longer sent, however one could artificially send them. Also, in the event of an error, none of the objects will be modified, whereas currently some of them will be.

Attachments (2)

t18556-remove-efficiency.diff (2.5 KB ) - added by Alex Gaynor 12 years ago.
18556.diff (3.3 KB ) - added by Tim Graham 12 years ago.

Download all attachments as: .zip

Change History (16)

by Alex Gaynor, 12 years ago

comment:1 by Anssi Kääriäinen, 12 years ago

Has patch: set

+1 for this change. There are already situations where the signals framework doesn't catch all changes. Using .update() in this situation seems natural. On the other hand, I do not actually use signals...

If we wanted to keep signals for this, then we should have some sort of pre/post update signal which one could use here. The signal would likely have an argument of "objs" which would be lazily fetched from the DB - if nobody access the objs then no extra work done.

The patch itself looks good to me.

by Tim Graham, 12 years ago

Attachment: 18556.diff added

comment:2 by Tim Graham, 12 years ago

I think it makes sense not to send any signals for consistency with how clear() works, plus it's going to be backwards-incompatible to some extent with save() no longer being called. I've added documentation, including a note to the backwards-incompatible changes for 1.6.

comment:3 by Carl Meyer, 12 years ago

The closest parallel to remove() in API terms is not clear(), it's add(). There is no consistency gained by having add() continue to send signals and remove() not. If we're going to do this, the same approach should be used for both add() and remove() (there's also no reason for having an efficiency difference between them).

There's also another backwards incompatibility here; the passed-in object instances themselves are no longer updated. This looks trivial to fix; just restore the setattr line in the loop.

comment:4 by Tim Graham, 12 years ago

Thanks Carl, good point. I'll be happy to update add() as well. Are you +1 on removing legal signals for these operations or do you think it needs a discussion on django-developers?

Last edited 10 years ago by johnhomes (previous) (diff)

comment:5 by Carl Meyer, 12 years ago

Yeah, I didn't comment on that in my first comment because I'm not sure :-) I think it will in all likelihood break real code if we stop sending these signals here. On the other hand, I also agree with Alex and Anssi that the current implementation is dumb, update() makes way more sense, and that it would be better to have an update signal.

I wouldn't say I'm a +1, but I'm not -1 either. Somewhere in the zero range, I guess :-) I'm not gonna stand in the way.

comment:6 by Anssi Kääriäinen, 12 years ago

There is also the possibility to have fast-path of .update() for no listeners case, and loop when there are listeners. The complexity of dual paths isn't big at all, and it is worth some complexity when one can save possibly large amounts of DB resources.

The code as written doesn't do just one query per obj, it does two. .save() will select, then update for each object. This is typical example case where either using update_fields, or applying #16649 would help.

comment:7 by Tim Graham, 12 years ago

For add(), it looks like we need to at least keep the possibility of executing save() for an object that hasn't been saved yet. See this test https://github.com/django/django/blob/master/tests/modeltests/many_to_one/tests.py#L49-L65

Here's what I came up with so far. Tests pass except for multiple_database. I'm guessing self.model.objects.filter(pk__in=ids).update(**{rel_field.name: self.instance}) needs a slightly different implementation to work with multi-db, but I'm not sure what that would be.

https://github.com/timgraham/django/commit/d6b2720138e53cde0e8f60471e8d92f444e87474

comment:8 by Anssi Kääriäinen, 12 years ago

The multidb issue is likely due to having the object saved (thus PK set), but into different DB.

Just checking the pk is set isn't enough, consulting the model._state.db + _state.adding would likely yield the correct result. If different than the current DB or adding, then save, else go to update() directly.

This seems to also raise a possible race condition. Assume thread T1 fetches the object from DB, then T2 deletes it, and then T1 issues add(). The result was that the object was resaved to DB, after patch it is that it remains deleted. In the add() case the correct behavior is resave so that after add you can trust that relation actually contains all the objects in the DB.

The race could most reliably be resolved by doing an UPDATE ... RETURNING PK. Check which PKs were updated, those that weren't must be resaved. Unfortunately RETURNING isn't available in MySQL or SQLite, so this idea would only apply to those DBs supporting returning. Others could of course do update(); values_list('pk'); save those not existing in the values_list separately.

Completely another matter is the race condition actually matters in real world situations...

comment:9 by Tim Graham, 11 years ago

Patch needs improvement: set

comment:10 by Tim Graham, 10 years ago

Owner: changed from nobody to Tim Graham
Patch needs improvement: unset
Status: newassigned
Summary: .remove() on a reverse foreign key executes too many queriesImprove query efficiency of .add() on a reverse foreign key
Version: 1.4master

It seems the issue with remove() has since been dealt with by adding a bulk parameter. I updated my original patch and fixed the multidb problems as suggested by Anssi.

comment:11 by Tim Graham, 10 years ago

Patch needs improvement: set

Loic would like the same bulk parameter for add(). Also GenericForeignKey needs to be updated with the same behavior.

comment:12 by Tim Graham, 10 years ago

Patch needs improvement: unset

comment:13 by Loïc Bistuer <loic.bistuer@…>, 9 years ago

Resolution: fixed
Status: assignedclosed

In adc0c4fb:

Fixed #18556 -- Allowed RelatedManager.add() to execute 1 query where possible.

Thanks Loic Bistuer for review.

comment:14 by Simon Charette <charette.s@…>, 9 years ago

In 6a46f23:

Refs #18556 -- Fixed a typo in the related manager add() method docs.

Note: See TracTickets for help on using tickets.
Back to Top