Opened 5 years ago

Closed 2 years ago

Last modified 2 years ago

#18556 closed Cleanup/optimization (fixed)

Improve query efficiency of .add() on a reverse foreign key

Reported by: Alex Gaynor Owned by: Tim Graham
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


It does one query per object received, instead of just one for the entire batch. Attached is a patch which fixes this. It's technically not backwards compatible because signals are no longer sent, however one could artificially send them. Also, in the event of an error, none of the objects will be modified, whereas currently some of them will be.

Attachments (2)

t18556-remove-efficiency.diff (2.5 KB) - added by Alex Gaynor 5 years ago.
18556.diff (3.3 KB) - added by Tim Graham 4 years ago.

Download all attachments as: .zip

Change History (16)

Changed 5 years ago by Alex Gaynor

comment:1 Changed 5 years ago by Anssi Kääriäinen

Has patch: set

+1 for this change. There are already situations where the signals framework doesn't catch all changes. Using .update() in this situation seems natural. On the other hand, I do not actually use signals...

If we wanted to keep signals for this, then we should have some sort of pre/post update signal which one could use here. The signal would likely have an argument of "objs" which would be lazily fetched from the DB - if nobody access the objs then no extra work done.

The patch itself looks good to me.

Changed 4 years ago by Tim Graham

Attachment: 18556.diff added

comment:2 Changed 4 years ago by Tim Graham

I think it makes sense not to send any signals for consistency with how clear() works, plus it's going to be backwards-incompatible to some extent with save() no longer being called. I've added documentation, including a note to the backwards-incompatible changes for 1.6.

comment:3 Changed 4 years ago by Carl Meyer

The closest parallel to remove() in API terms is not clear(), it's add(). There is no consistency gained by having add() continue to send signals and remove() not. If we're going to do this, the same approach should be used for both add() and remove() (there's also no reason for having an efficiency difference between them).

There's also another backwards incompatibility here; the passed-in object instances themselves are no longer updated. This looks trivial to fix; just restore the setattr line in the loop.

comment:4 Changed 4 years ago by Tim Graham

Thanks Carl, good point. I'll be happy to update add() as well. Are you +1 on removing legal signals for these operations or do you think it needs a discussion on django-developers?

Last edited 2 years ago by johnhomes (previous) (diff)

comment:5 Changed 4 years ago by Carl Meyer

Yeah, I didn't comment on that in my first comment because I'm not sure :-) I think it will in all likelihood break real code if we stop sending these signals here. On the other hand, I also agree with Alex and Anssi that the current implementation is dumb, update() makes way more sense, and that it would be better to have an update signal.

I wouldn't say I'm a +1, but I'm not -1 either. Somewhere in the zero range, I guess :-) I'm not gonna stand in the way.

comment:6 Changed 4 years ago by Anssi Kääriäinen

There is also the possibility to have fast-path of .update() for no listeners case, and loop when there are listeners. The complexity of dual paths isn't big at all, and it is worth some complexity when one can save possibly large amounts of DB resources.

The code as written doesn't do just one query per obj, it does two. .save() will select, then update for each object. This is typical example case where either using update_fields, or applying #16649 would help.

comment:7 Changed 4 years ago by Tim Graham

For add(), it looks like we need to at least keep the possibility of executing save() for an object that hasn't been saved yet. See this test

Here's what I came up with so far. Tests pass except for multiple_database. I'm guessing self.model.objects.filter(pk__in=ids).update(**{ self.instance}) needs a slightly different implementation to work with multi-db, but I'm not sure what that would be.

comment:8 Changed 4 years ago by Anssi Kääriäinen

The multidb issue is likely due to having the object saved (thus PK set), but into different DB.

Just checking the pk is set isn't enough, consulting the model._state.db + _state.adding would likely yield the correct result. If different than the current DB or adding, then save, else go to update() directly.

This seems to also raise a possible race condition. Assume thread T1 fetches the object from DB, then T2 deletes it, and then T1 issues add(). The result was that the object was resaved to DB, after patch it is that it remains deleted. In the add() case the correct behavior is resave so that after add you can trust that relation actually contains all the objects in the DB.

The race could most reliably be resolved by doing an UPDATE ... RETURNING PK. Check which PKs were updated, those that weren't must be resaved. Unfortunately RETURNING isn't available in MySQL or SQLite, so this idea would only apply to those DBs supporting returning. Others could of course do update(); values_list('pk'); save those not existing in the values_list separately.

Completely another matter is the race condition actually matters in real world situations...

comment:9 Changed 4 years ago by Tim Graham

Patch needs improvement: set

comment:10 Changed 2 years ago by Tim Graham

Owner: changed from nobody to Tim Graham
Patch needs improvement: unset
Status: newassigned
Summary: .remove() on a reverse foreign key executes too many queriesImprove query efficiency of .add() on a reverse foreign key
Version: 1.4master

It seems the issue with remove() has since been dealt with by adding a bulk parameter. I updated my original patch and fixed the multidb problems as suggested by Anssi.

comment:11 Changed 2 years ago by Tim Graham

Patch needs improvement: set

Loic would like the same bulk parameter for add(). Also GenericForeignKey needs to be updated with the same behavior.

comment:12 Changed 2 years ago by Tim Graham

Patch needs improvement: unset

comment:13 Changed 2 years ago by Loïc Bistuer <loic.bistuer@…>

Resolution: fixed
Status: assignedclosed

In adc0c4fb:

Fixed #18556 -- Allowed RelatedManager.add() to execute 1 query where possible.

Thanks Loic Bistuer for review.

comment:14 Changed 2 years ago by Simon Charette <charette.s@…>

In 6a46f23:

Refs #18556 -- Fixed a typo in the related manager add() method docs.

Note: See TracTickets for help on using tickets.
Back to Top