Opened 16 years ago

Closed 9 years ago

Last modified 8 years ago

#6707 closed Bug (fixed)

ManyToManyField clears and recreates all data

Reported by: favo <favo@…> Owned by: Loic Bistuer
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: kmike84@…, sfllaw@…, jblaine@…, bradley.ayers@…, flavio.curella@…, botondus@…, info@…, info@…, cmawebsite@…, toracle@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Current ReverseManyRelatedObjectsDescriptor implement set method:

  1. clear all data
  2. set to new data.

This may cause a huge data changes, especially when using Django Admin, the default M2M field save_form_data() method use the set method to maintain m2m data from form:

    def save_form_data(self, instance, data):
        setattr(instance, self.attname, data)

In most case we just want to add remove a small set data from m2m relaionship list. This patch use another implementation, first it delete all exist rows but not in the new data list, then only add new data which not in database.

Attachments (2)

django_m2m_set_method.diff (819 bytes ) - added by favo <favo@…> 16 years ago.
patch of set method
6707_m2m_set.diff (10.8 KB ) - added by Simon Law 12 years ago.
Initial patch

Download all attachments as: .zip

Change History (46)

by favo <favo@…>, 16 years ago

Attachment: django_m2m_set_method.diff added

patch of set method

comment:1 by Malcolm Tredinnick, 16 years ago

Triage Stage: UnreviewedDesign decision needed

This change runs a whole extra database query, which is far from zero cost. So each approach is better in different circumstances and there's no way to tell which one will be better ahead of time. Thus a change like this would just be changing things, not a universal improvement.

A better approach would be to track what's changed by tracking the old values before updating them, which has been suggested in another ticket. I'm not convinced that's a good idea for all fields, but it might be worthwhile for the multi-valued relations like many-to-many.

comment:2 by favo <favo@…>, 16 years ago

Yes, as you said, it's depends. And it's true the patch need a extra full database query. However when you use it in some m2m widget like the admin ones, you already do a full database query to show all choices. And the patch don't change the exist data(don't like before remove but add later.), this make sense in some situation. In one of my project, we have lot's signal hook with m2m relationship changed(with the help of #5390), each receiver have to run a heavy database logic, so we need this patch do reduce unnecessary changes.
Anyway it's a balance, not a universal improvement, I write here, but part people need it.

mtredinnick, thanks for feedback. But I didn't find the ticket you mentioned?

comment:3 by Malcolm Tredinnick, 16 years ago

The problem is there's no way to tell when this will be better than the current behaviour. So saying "it's an improvement somtimes" doesn't help. It's also a distinct unimprovement in other cases (and is not only used in the admin, so putting in something just for that usage isn't a strong argument). For that reason I'm not in favour of making this change because it's just changing which are the worse cases. It happens it would favour your uses, but somebody's always going to have the bad side of the deal with something like this.

#4102 is the ticket I was referring to, although I'm not sure the implementation there is great because of the additional overhead. However in some cases the general idea may make sense (e.g. m2m fields). Testing and profiling is needed.

comment:4 by orcun avsar <orc.avs@…>, 14 years ago

Current behaviour causes m2m_changed signal not to work properly (#13962). Most of times m2m_changed signal is used in Django admin. So m2m_changed signal is mostly useless at this state. Is there a way to detect if a m2m_changed is connected for a model field. This patch may be applied as another save method so Django admin may branch to that method if m2m_changed signal is connected to a receiver.

comment:5 by Gabriel Hurley, 13 years ago

Triage Stage: Design decision neededAccepted

Given that 1.2 gained signals for ManyToMany relationships, it's worth fixing this properly now. Sending out the correct signals seems like a worthwhile tradeoff for the performance cost of implementation, whichever the correct approach may be.

comment:6 by Jonathan Morgan, 13 years ago

Patch needs improvement: set

Greetings,

I have some experience with this sort of thing, and would love for the m2m_changed signals to be correctly issued by the admin. I will download the patch and check it out to see what it does, how hard it would be to contribute a fix. One strategy I've used to minimize database access is to create a hidden input that is the list of a given set of relations that is passed with updates, so you can use it to see what has changed since the page was rendered (it can be string-delimited using multiple delimiters, akin to old Pick-style database records which had value marks and sub-value marks, to hold a list of all three keys in each M2M relation when the page was generated). If you did this, you could compare the submitted list against the data structure encoded in this hidden form input to check which needs to be removed, which are new, and which are unchanged. It definitely isn't perfectly thread-safe (though the alternative of hitting the database for that list isn't, either), but it could be an answer if limiting database access is preferable to creating a larger chance of a race condition. You could also add a check for the freshness of that list (a simple count of relations for the current instance might suffice?) and use the passed value if the check passes, or revert to hitting the database if the check fails. If I did this, I'd make the check a separate method (or would love guidance as to the preferred way of making something modular and override-able within django), so that we could refine the implementation of the freshness check later.

In looking at the patch, does the manager already invoke the signals we want invoked as part of its processing? Perhaps I could start by placing that code in my django instance and seeing how the signals behave. At the least, I'd probably add documentation to the patch, and break it out so each line contains a single code task (I must admit to not being very pythonic in that regard - if there is only one thing on a line, then you know if you change that line, you only change that thing - like adding calcium to bones - keeps them tough but flexible).

I have never contributed code to django before, so if this is premature, please help me understand the community norms of where I should begin contributing (probably by proving my follow-through-ness by completing the two documentation patches I agreed to write).

Thanks,

Jonathan Morgan

comment:7 by Julien Phalip, 13 years ago

Type: Cleanup/optimization

comment:8 by Julien Phalip, 13 years ago

Severity: Normal

comment:9 by Mikhail Korobov, 13 years ago

Cc: kmike84@… added
Easy pickings: unset
UI/UX: unset

comment:10 by Simon Law, 13 years ago

Cc: sfllaw@… added

comment:11 by Simon Law, 13 years ago

Because __set__() clears and adds, it does a lot of extra work when assignment happens, since each object in the new set has to be INSERTed in the database, even if it's just to add an extra item.

The extra query can be eliminated in _add_items(), which can be told that its list of *objs do not already exist so it doesn't have to check for them.

We've hit this problem with m2m_changed on any ModelForm, so it's not just limited to the admin site.

If there's interest, I can write up a patch that implements this.

comment:12 by Jeff Blaine, 12 years ago

Cc: jblaine@… added

comment:13 by Jeff Blaine, 12 years ago

I'm interested in any patch that will give me access to the new data and operations performed when a model's m2m field is altered. Right now, m2m_changed is totally useless to me, and is actually dangerous as others have pointed out in other bugs related to this one.

Me: Remove 2 relationships from these 200 that are set.

Admin: Okay! I removed 200 things and added all 198 items back that did not match your 2! And I sent not one single pre_remove or post_remove signal''

Me: That's not at all what I said, and it totally matters how you go about it and that your signals+actions properly reflect what is being asked and not reflecting how you went about doing it.

I suspect it's no surprise that I consider this a bug, not a "cleanup/optimization" as it's now categorized :(

comment:14 by Carl Meyer, 12 years ago

Type: Cleanup/optimizationBug

Closed #16073 (which was specifically about incorrect m2m signals in the admin, as a result of this) as duplicate. Marking this a bug rather than a cleanup/optimization; previous to the existence of m2m signals it may have been the latter, but at this point the signaling problem makes it a bug.

by Simon Law, 12 years ago

Attachment: 6707_m2m_set.diff added

Initial patch

comment:15 by Simon Law, 12 years ago

Owner: changed from nobody to Simon Law
Patch needs improvement: unset
Status: newassigned

6707_m2m_set.diff includes a patch that performs the minimum amount of DELETE and INSERT statements for the __set__ method. In addition, I have changed _add_items() so it does not do an extra SELECT if __set__ has already removed duplicates, so we do not perform an extra query.

This patch applies to SVN trunk.

comment:16 by Simon Law, 12 years ago

It occurred to me, while writing this patch, that the __set__ methods for ManyRelatedObjectsDescriptor and ReverseManyRelatedObjectsDescriptor are basically the same. Should those two Descriptors be unified into a base descriptor?

comment:17 by Bradley Ayers <bradley.ayers@…>, 12 years ago

Cc: bradley.ayers@… added

This is fantastic, I just wanted to say thank-you very much for writing this latest patch!

comment:18 by Anssi Kääriäinen, 12 years ago

I haven't tested this ticket, just skimmed the latest patch. To me it seems this is worth getting into Django, the savings can be pretty large if one regularly does small changes to m2m fields using list assignment.

comment:19 by Simon Law, 12 years ago

akaariai: Are there any improvements to this patch that you can suggest?

comment:20 by Anssi Kääriäinen, 12 years ago

Not anything I spotted immediately. As said, I haven't gone through this in full detail. If you can get somebody to review the patch it would of course be a plus.

I will try to get time to work on this. I can't promise anything, though.

comment:21 by Simon Law, 12 years ago

It'd be great to get a core developer to look at whether the suggested refactoring above would be sensible.

comment:22 by Flavio Curella, 12 years ago

Cc: flavio.curella@… added

comment:23 by Béres Botond, 11 years ago

Cc: botondus@… added

Ran into this as well on multiple occasions.
This basically makes the m2m_changed signal almost useless :(

comment:24 by Ivan Virabyan, 11 years ago

As I understand there is currently no way to determine which values were added to many2many field in the admin?

comment:25 by RIck van der Zwet, 11 years ago

Cc: info@… added

comment:26 by a.krille@…, 10 years ago

Any news on this?

We too are bidden by this bug (and it is a bug when a signal is documented but never actually emitted) and would love to see a proper fix.

comment:27 by Markus Holtermann, 10 years ago

Cc: info@… added

comment:28 by anonymous, 10 years ago

This bug was opened "6 years ago" and we are still around the "patch". What would it take to get this fixed?

comment:29 by anonymous, 10 years ago

Shouldn't at least the documentation for previous versions including version 1.6 be updated to not include the pre_remove and post_remove actions in the m2m-changed signal?

comment:30 by loic84, 10 years ago

Owner: changed from Simon Law to loic84

comment:31 by anonymous, 10 years ago

Bitten too.

comment:32 by loic84, 10 years ago

Needs documentation: set
Needs tests: set
Patch needs improvement: set

There's no documentation yet and I still want to add more tests, but the latest effort for this issue is at https://github.com/django/django/pull/2500.

comment:33 by Loic Bistuer, 9 years ago

Needs documentation: unset
Needs tests: unset
Owner: changed from loic84 to Loic Bistuer

Resuming work on this issue, I brought the branch up-to-date and drafted some documentation. It'd be very helpful if some of those affected by this issue could give it a try.

comment:34 by Nils Lundquist, 9 years ago

I've tested the branch Loic has worked on and my issue still remains. m2m_changed is not triggered when the relationship is altered via a StackInline admin form. No doubt this is a significant bug.

Last edited 9 years ago by Nils Lundquist (previous) (diff)

comment:35 by Collin Anderson, 9 years ago

Cc: cmawebsite@… added

comment:36 by Collin Anderson, 9 years ago

Summary: Another implementation for ReverseManyRelatedObjectsDescriptor.__set__ methodManyToManyField clears and recreates all data

comment:37 by Loic Bistuer, 9 years ago

I've rebased the PR and I'm quite happy with the result.

I checked regarding comment:34 and indeed when M2M are displayed as admin inlines m2m_changed isn't triggered. It's because the M2M field is bypassed and the through table is seen by the admin as a normal model with 2 FKs. That's an orthogonal problem but I added docs about it in the PR.

comment:38 by Loic Bistuer, 9 years ago

Patch needs improvement: unset

comment:39 by Loic Bistuer <loic.bistuer@…>, 9 years ago

Resolution: fixed
Status: assignedclosed

In 71ada3a8e689a883b5ffdeb1744ea16f176ab730:

Fixed #6707 -- Added RelatedManager.set() and made descriptors' set use it.

Thanks Anssi Kääriäinen, Carl Meyer, Collin Anderson, and Tim Graham for the reviews.

comment:40 by Loic Bistuer <loic.bistuer@…>, 9 years ago

In 9d104a21e20f9c5ec41d19fd919d0e808aa13dba:

Documented that m2m_changed isn't triggered when M2M is displayed as admin inline.

Refs #6707.

comment:41 by Loic Bistuer <loic.bistuer@…>, 9 years ago

In 20eb51ce0ddcb8d5fdc772fc2839fefef5b28c2a:

Fix small regression caused by 71ada3a8e689a883b5ffdeb1744ea16f176ab730.

During direct assignment, evaluating the iterable before the transaction
is started avoids leaving the transaction dirty if an exception is raised.
This is slightly more wasteful but probably not enough to warrant a change
of behavior.

Thanks Anssi for the feedback. Refs #6707.

comment:42 by Nuno Khan, 9 years ago

@Loic is this commit going to be released for Django 1.7 still ??? I am using Django 1.7.8 and i noticed that its not there yet

comment:43 by Collin Anderson, 9 years ago

@psychok7 this is actually considered more of a new feature / optimization, so it's not backported to 1.7. It actually missed the deadline for getting into 1.8 by a few weeks, but it will be in 1.9 which comes out towards the end of the year.

comment:44 by Jeongsoo, Park, 8 years ago

Cc: toracle@… added
Note: See TracTickets for help on using tickets.
Back to Top