Opened 11 months ago

Closed 4 months ago

Last modified 4 months ago

#28514 closed Cleanup/optimization (fixed)

Clarify docs regarding idempotence of RelatedManager.add()

Reported by: Дилян Палаузов Owned by: Jezeniel Zapanta
Component: Documentation Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: yes UI/UX: no

Description (last modified by Tim Graham)

At https://docs.djangoproject.com/en/1.11/topics/db/examples/many_to_many/ it's written:

>> a2.publications.add(p3)
>
> Adding a second time is OK:
> 
>> a2.publications.add(p3)

Please rephrase it to "Adding a second time is OK, it doesn't duplicate the relation".

At https://docs.djangoproject.com/en/1.11/ref/models/relations/ it's written:

 add(*objs, bulk=True)
 Adds the specified model objects to the related object set.

Please amend: "Inserting existing relations this way does not cause problems."

The purpose of the changes is to make clear, that add() is idempotent and it is not necessary first to read the data from the database, join with the new values and eventually write the resulting set back.

Change History (19)

comment:1 Changed 11 months ago by Tim Graham

Description: modified (diff)
Easy pickings: set
Summary: Documentating idempotence of Many2ManyField.add()Clarify docs regarding idempotence of RelatedManager.add()
Triage Stage: UnreviewedAccepted

comment:2 Changed 11 months ago by Jack Mustacato

Owner: changed from nobody to Jack Mustacato
Status: newassigned

comment:3 Changed 11 months ago by Jack Mustacato

Has patch: set

comment:4 Changed 11 months ago by Tim Martin

Triage Stage: AcceptedReady for checkin

I've reviewed the PR, it all looks good to me.

comment:5 Changed 11 months ago by Tim Graham

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:6 Changed 9 months ago by Дилян Палаузов

It turns out, that add() suffers from concurrency problem and is therefore not idempotent:

django/db/models/fields/related_descriptors.py:create_forward_many_to_many_manager.ManyRelatedManager.add() calls self.add(), which does

with transaction.atomic(using=db. savepoint=False):
    self._add_items(self.source_Field_name, self.target_field_name, *objs)

and _add_items does

with transaction.atomic(using=db, savepoint=False):
    signals.m2m_changes.send(...)
    self.through._default_manager.using(db).bulk_create([...])

Minor question: providing that add() is the only caller of _add_items, does the second transaction(savepoint=False) have added value?
Major question: bulk_create([]) can fail, if two threads simultaneously try to insert the same data, in which case the m2m_changed signal is sent, but bulk_create fails completely.

My proposal:

  • First try bulk_insert, if it does not throw exception, send m2m_changed signal.
  • Either update the documentation of .add() accordingly (if it throws IntegrityError, the caller shall retry the operation), or make .add() do the retries internally.
  • Once bulk_create(... on_conflict='ignore') #28668 is implemented, revert the previous step, pass on_conflict='ignore' to bulk_create, and:
    • In case of Postgresql retrieve information from bulk_create which objects were actually inserted, and send only for them m2m_changed
    • For other databases, either send signal for all objects, even those which were in the database, don't use bulk_create or throw IntegrityError
Last edited 8 months ago by Jezeniel Zapanta (previous) (diff)

comment:7 Changed 9 months ago by Jack Mustacato

Owner: Jack Mustacato deleted
Status: assignednew

comment:10 Changed 9 months ago by Jezeniel Zapanta

I could work on this one, this is my first time though. I saw the pull request, I can improve that with additional information about errors being raised or signals being sent.
Will work on this over the weekend.

Last edited 9 months ago by Jezeniel Zapanta (previous) (diff)

comment:11 Changed 9 months ago by Jezeniel Zapanta

Owner: set to Jezeniel Zapanta
Status: newassigned

comment:12 Changed 9 months ago by Jezeniel Zapanta

Hi, submitted the PR, for your review.

comment:13 Changed 8 months ago by Дилян Палаузов

Regarding the PR at https://github.com/django/django/pull/9314 , as noted in my last comment above, .add() doesn't work as documented, if two threads call it with partially overlapping set of objects.

Last edited 8 months ago by Tim Graham (previous) (diff)

comment:14 in reply to:  13 Changed 8 months ago by Jezeniel Zapanta

Replying to Дилян Палаузов:

Regarding the PR at https://github.com/django/django/pull/9314 , as noted in my last comment above, .add() doesn't work as documented, if two threads call it with partially overlapping set of objects.

What is the next step? Should I change the documentation that to handle concurrency problems you should handle IntegrityError? or wait for #28668 to be Implemented?

comment:15 Changed 8 months ago by Дилян Палаузов

I don't think that the ON CONFLICT DO NOTHING change will get into Django 2.0, even if #28668 was fixed yesterday.

Under these circumstances for the next year or so .add(one more elements) can raise an IntegrityError or send wrong m2m_changed signals and this should be documented.

comment:16 Changed 5 months ago by Дилян Палаузов

Any progress here?

I think only the concurrent aspect of .add() is missing:

  • if two .add() are called in parallel with overlapping set of related objects, one of them can fail with IntegrityError and has to be repeated. This is caused by the fact that Django inserts the objects within a transaction and only after the transaction is closed it can be determined if the changes are permament. Django does not retry in this case
  • When IntegrityError is thrown, m2m_changed is still sent for the objects, that were supposed to be matched.

The latter is valid if I recall correctly the circumstances from five months ago. The code can certainly be changed so that m2m_changed is not sent, but it is better to document the current behaviour than doing noting.

Jack, I would like to encourage you to use the opportunity and check the implementation, while you describe it.

Apart from this my plan is to work with Django for one more month and I will be glad if by then all concerns related to this ticket can be clarified.

comment:17 Changed 5 months ago by Tim Graham

If RelatedManager.add() isn't idempotent, then that sounds like a bug. We prefer to spend time fixing bugs rather than documenting them, so that's the approach I'd suggest.

comment:18 Changed 5 months ago by Дилян Палаузов

This approach is nice, but status quo is misleading the ones who try to use .add().

If currently nobody has the time to fix the bug, then it is more likely that somebody will have the time to document the behaviour, as this requires less time.

comment:19 Changed 4 months ago by Дилян Палаузов

Undoubtedly ​https://github.com/django/django/pull/9314 shall be integrated, it fixes the problem this ticket initially raised and is independent of other problems .add() has.

Unfortunately on the above link is shown some tests have failed, but I cannot see now why.

The concurrent mystery is tracked under #19544.

comment:20 Changed 4 months ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In abe6c5de:

Fixed #28514 -- Clarifed docs about idempotence of RelatedManager.add().

comment:21 Changed 4 months ago by Tim Graham <timograham@…>

In 83362cb:

[2.0.x] Fixed #28514 -- Clarifed docs about idempotence of RelatedManager.add().

Backport of abe6c5defefc7057e7fb5f47b79643f7b89f7d90 from master

Note: See TracTickets for help on using tickets.
Back to Top