Code

Opened 6 years ago

Last modified 2 weeks ago

#6707 assigned Bug

Another implementation for ReverseManyRelatedObjectsDescriptor.__set__ method

Reported by: favo <favo@…> Owned by: loic84
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: kmike84@…, sfllaw@…, jblaine@…, bradley.ayers@…, flavio.curella@…, botondus@…, info@…, info@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Current ReverseManyRelatedObjectsDescriptor implement set method:

  1. clear all data
  2. set to new data.

This may cause a huge data changes, especially when using Django Admin, the default M2M field save_form_data() method use the set method to maintain m2m data from form:

    def save_form_data(self, instance, data):
        setattr(instance, self.attname, data)

In most case we just want to add remove a small set data from m2m relaionship list. This patch use another implementation, first it delete all exist rows but not in the new data list, then only add new data which not in database.

Attachments (2)

django_m2m_set_method.diff (819 bytes) - added by favo <favo@…> 6 years ago.
patch of set method
6707_m2m_set.diff (10.8 KB) - added by sfllaw 2 years ago.
Initial patch

Download all attachments as: .zip

Change History (34)

Changed 6 years ago by favo <favo@…>

patch of set method

comment:1 Changed 6 years ago by mtredinnick

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

This change runs a whole extra database query, which is far from zero cost. So each approach is better in different circumstances and there's no way to tell which one will be better ahead of time. Thus a change like this would just be changing things, not a universal improvement.

A better approach would be to track what's changed by tracking the old values before updating them, which has been suggested in another ticket. I'm not convinced that's a good idea for all fields, but it might be worthwhile for the multi-valued relations like many-to-many.

comment:2 Changed 6 years ago by favo <favo@…>

Yes, as you said, it's depends. And it's true the patch need a extra full database query. However when you use it in some m2m widget like the admin ones, you already do a full database query to show all choices. And the patch don't change the exist data(don't like before remove but add later.), this make sense in some situation. In one of my project, we have lot's signal hook with m2m relationship changed(with the help of #5390), each receiver have to run a heavy database logic, so we need this patch do reduce unnecessary changes.
Anyway it's a balance, not a universal improvement, I write here, but part people need it.

mtredinnick, thanks for feedback. But I didn't find the ticket you mentioned?

comment:3 Changed 6 years ago by mtredinnick

The problem is there's no way to tell when this will be better than the current behaviour. So saying "it's an improvement somtimes" doesn't help. It's also a distinct unimprovement in other cases (and is not only used in the admin, so putting in something just for that usage isn't a strong argument). For that reason I'm not in favour of making this change because it's just changing which are the worse cases. It happens it would favour your uses, but somebody's always going to have the bad side of the deal with something like this.

#4102 is the ticket I was referring to, although I'm not sure the implementation there is great because of the additional overhead. However in some cases the general idea may make sense (e.g. m2m fields). Testing and profiling is needed.

comment:4 Changed 4 years ago by orcun avsar <orc.avs@…>

Current behaviour causes m2m_changed signal not to work properly (#13962). Most of times m2m_changed signal is used in Django admin. So m2m_changed signal is mostly useless at this state. Is there a way to detect if a m2m_changed is connected for a model field. This patch may be applied as another save method so Django admin may branch to that method if m2m_changed signal is connected to a receiver.

comment:5 Changed 3 years ago by gabrielhurley

  • Triage Stage changed from Design decision needed to Accepted

Given that 1.2 gained signals for ManyToMany relationships, it's worth fixing this properly now. Sending out the correct signals seems like a worthwhile tradeoff for the performance cost of implementation, whichever the correct approach may be.

comment:6 Changed 3 years ago by jonathanmorgan

  • Patch needs improvement set

Greetings,

I have some experience with this sort of thing, and would love for the m2m_changed signals to be correctly issued by the admin. I will download the patch and check it out to see what it does, how hard it would be to contribute a fix. One strategy I've used to minimize database access is to create a hidden input that is the list of a given set of relations that is passed with updates, so you can use it to see what has changed since the page was rendered (it can be string-delimited using multiple delimiters, akin to old Pick-style database records which had value marks and sub-value marks, to hold a list of all three keys in each M2M relation when the page was generated). If you did this, you could compare the submitted list against the data structure encoded in this hidden form input to check which needs to be removed, which are new, and which are unchanged. It definitely isn't perfectly thread-safe (though the alternative of hitting the database for that list isn't, either), but it could be an answer if limiting database access is preferable to creating a larger chance of a race condition. You could also add a check for the freshness of that list (a simple count of relations for the current instance might suffice?) and use the passed value if the check passes, or revert to hitting the database if the check fails. If I did this, I'd make the check a separate method (or would love guidance as to the preferred way of making something modular and override-able within django), so that we could refine the implementation of the freshness check later.

In looking at the patch, does the manager already invoke the signals we want invoked as part of its processing? Perhaps I could start by placing that code in my django instance and seeing how the signals behave. At the least, I'd probably add documentation to the patch, and break it out so each line contains a single code task (I must admit to not being very pythonic in that regard - if there is only one thing on a line, then you know if you change that line, you only change that thing - like adding calcium to bones - keeps them tough but flexible).

I have never contributed code to django before, so if this is premature, please help me understand the community norms of where I should begin contributing (probably by proving my follow-through-ness by completing the two documentation patches I agreed to write).

Thanks,

Jonathan Morgan

comment:7 Changed 3 years ago by julien

  • Type set to Cleanup/optimization

comment:8 Changed 3 years ago by julien

  • Severity set to Normal

comment:9 Changed 3 years ago by kmike

  • Cc kmike84@… added
  • Easy pickings unset
  • UI/UX unset

comment:10 Changed 3 years ago by sfllaw

  • Cc sfllaw@… added

comment:11 Changed 3 years ago by sfllaw

Because __set__() clears and adds, it does a lot of extra work when assignment happens, since each object in the new set has to be INSERTed in the database, even if it's just to add an extra item.

The extra query can be eliminated in _add_items(), which can be told that its list of *objs do not already exist so it doesn't have to check for them.

We've hit this problem with m2m_changed on any ModelForm, so it's not just limited to the admin site.

If there's interest, I can write up a patch that implements this.

comment:12 Changed 2 years ago by jblaine

  • Cc jblaine@… added

comment:13 Changed 2 years ago by jblaine

I'm interested in any patch that will give me access to the new data and operations performed when a model's m2m field is altered. Right now, m2m_changed is totally useless to me, and is actually dangerous as others have pointed out in other bugs related to this one.

Me: Remove 2 relationships from these 200 that are set.

Admin: Okay! I removed 200 things and added all 198 items back that did not match your 2! And I sent not one single pre_remove or post_remove signal''

Me: That's not at all what I said, and it totally matters how you go about it and that your signals+actions properly reflect what is being asked and not reflecting how you went about doing it.

I suspect it's no surprise that I consider this a bug, not a "cleanup/optimization" as it's now categorized :(

comment:14 Changed 2 years ago by carljm

  • Type changed from Cleanup/optimization to Bug

Closed #16073 (which was specifically about incorrect m2m signals in the admin, as a result of this) as duplicate. Marking this a bug rather than a cleanup/optimization; previous to the existence of m2m signals it may have been the latter, but at this point the signaling problem makes it a bug.

Changed 2 years ago by sfllaw

Initial patch

comment:15 Changed 2 years ago by sfllaw

  • Owner changed from nobody to sfllaw
  • Patch needs improvement unset
  • Status changed from new to assigned

6707_m2m_set.diff includes a patch that performs the minimum amount of DELETE and INSERT statements for the __set__ method. In addition, I have changed _add_items() so it does not do an extra SELECT if __set__ has already removed duplicates, so we do not perform an extra query.

This patch applies to SVN trunk.

comment:16 Changed 2 years ago by sfllaw

It occurred to me, while writing this patch, that the __set__ methods for ManyRelatedObjectsDescriptor and ReverseManyRelatedObjectsDescriptor are basically the same. Should those two Descriptors be unified into a base descriptor?

comment:17 Changed 2 years ago by Bradley Ayers <bradley.ayers@…>

  • Cc bradley.ayers@… added

This is fantastic, I just wanted to say thank-you very much for writing this latest patch!

comment:18 Changed 21 months ago by akaariai

I haven't tested this ticket, just skimmed the latest patch. To me it seems this is worth getting into Django, the savings can be pretty large if one regularly does small changes to m2m fields using list assignment.

comment:19 Changed 21 months ago by sfllaw

akaariai: Are there any improvements to this patch that you can suggest?

comment:20 Changed 21 months ago by akaariai

Not anything I spotted immediately. As said, I haven't gone through this in full detail. If you can get somebody to review the patch it would of course be a plus.

I will try to get time to work on this. I can't promise anything, though.

comment:21 Changed 21 months ago by sfllaw

It'd be great to get a core developer to look at whether the suggested refactoring above would be sensible.

comment:22 Changed 20 months ago by fcurella

  • Cc flavio.curella@… added

comment:23 Changed 17 months ago by bberes

  • Cc botondus@… added

Ran into this as well on multiple occasions.
This basically makes the m2m_changed signal almost useless :(

comment:24 Changed 12 months ago by ivan_virabyan

As I understand there is currently no way to determine which values were added to many2many field in the admin?

comment:25 Changed 8 months ago by rickvanderzwet

  • Cc info@… added

comment:26 Changed 5 months ago by a.krille@…

Any news on this?

We too are bidden by this bug (and it is a bug when a signal is documented but never actually emitted) and would love to see a proper fix.

comment:27 Changed 5 months ago by MarkusH

  • Cc info@… added

comment:28 Changed 3 weeks ago by anonymous

This bug was opened "6 years ago" and we are still around the "patch". What would it take to get this fixed?

comment:29 Changed 3 weeks ago by anonymous

Shouldn't at least the documentation for previous versions including version 1.6 be updated to not include the pre_remove and post_remove actions in the m2m-changed signal?

comment:30 Changed 3 weeks ago by loic84

  • Owner changed from sfllaw to loic84

comment:31 Changed 2 weeks ago by anonymous

Bitten too.

comment:32 Changed 2 weeks ago by loic84

  • Needs documentation set
  • Needs tests set
  • Patch needs improvement set

There's no documentation yet and I still want to add more tests, but the latest effort for this issue is at https://github.com/django/django/pull/2500.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as assigned
The owner will be changed from loic84 to anonymous. Next status will be 'assigned'
The ticket will be disowned. Next status will be 'new'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.