Opened 3 months ago

Last modified 2 days ago

#28514 assigned Cleanup/optimization

Clarify docs regarding idempotence of RelatedManager.add()

Reported by: Дилян Палаузов Owned by: Jezeniel Zapanta
Component: Documentation Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: yes UI/UX: no

Description (last modified by Tim Graham)

At https://docs.djangoproject.com/en/1.11/topics/db/examples/many_to_many/ it's written:

>> a2.publications.add(p3)
>
> Adding a second time is OK:
> 
>> a2.publications.add(p3)

Please rephrase it to "Adding a second time is OK, it doesn't duplicate the relation".

At https://docs.djangoproject.com/en/1.11/ref/models/relations/ it's written:

 add(*objs, bulk=True)
 Adds the specified model objects to the related object set.

Please amend: "Inserting existing relations this way does not cause problems."

The purpose of the changes is to make clear, that add() is idempotent and it is not necessary first to read the data from the database, join with the new values and eventually write the resulting set back.

Change History (13)

comment:1 Changed 3 months ago by Tim Graham

Description: modified (diff)
Easy pickings: set
Summary: Documentating idempotence of Many2ManyField.add()Clarify docs regarding idempotence of RelatedManager.add()
Triage Stage: UnreviewedAccepted

comment:2 Changed 3 months ago by Jack Mustacato

Owner: changed from nobody to Jack Mustacato
Status: newassigned

comment:3 Changed 3 months ago by Jack Mustacato

Has patch: set

comment:4 Changed 3 months ago by Tim Martin

Triage Stage: AcceptedReady for checkin

I've reviewed the PR, it all looks good to me.

comment:5 Changed 3 months ago by Tim Graham

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:6 Changed 6 weeks ago by Дилян Палаузов

It turns out, that add() suffers from concurrency problem and is therefore not idempotent:

django/db/models/fields/related_descriptors.py:create_forward_many_to_many_manager.ManyRelatedManager.add() calls self.add(), which does

with transaction.atomic(using=db. savepoint=False):
    self._add_items(self.source_Field_name, self.target_field_name, *objs)

and _add_items does

with transaction.atomic(using=db, savepoint=False):
    signals.m2m_changes.send(...)
    self.through._default_manager.using(db).bulk_create([...])

Minor question: providing that add() is the only caller of _add_items, does the second transaction(savepoint=False) have added value?
Major question: bulk_create([]) can fail, if two threads simultaneously try to insert the same data, in which case the m2m_changed signal is sent, but bulk_create fails completely.

My proposal:

  • First try bulk_insert, if it does not throw exception, send m2m_changed signal.
  • Either update the documentation of .add() accordingly (if it throws IntegrityError, the caller shall retry the operation), or make .add() do the retries internally.
  • Once bulk_create(... on_conflict='ignore') #28668 is implemented, revert the previous step, pass on_conflict='ignore' to bulk_create, and:
    • In case of Postgresql retrieve information from bulk_create which objects were actually inserted, and send only for them m2m_changed
    • For other databases, either send signal for all objects, even those which were in the database, don't use bulk_create or throw IntegrityError
Last edited 12 days ago by Jezeniel Zapanta (previous) (diff)

comment:7 Changed 6 weeks ago by Jack Mustacato

Owner: Jack Mustacato deleted
Status: assignednew

comment:10 Changed 5 weeks ago by Jezeniel Zapanta

I could work on this one, this is my first time though. I saw the pull request, I can improve that with additional information about errors being raised or signals being sent.
Will work on this over the weekend.

Last edited 5 weeks ago by Jezeniel Zapanta (previous) (diff)

comment:11 Changed 5 weeks ago by Jezeniel Zapanta

Owner: set to Jezeniel Zapanta
Status: newassigned

comment:12 Changed 2 weeks ago by Jezeniel Zapanta

Hi, submitted the PR, for your review.

comment:13 Changed 13 days ago by Дилян Палаузов

Regarding the PR at https://github.com/django/django/pull/9314 , as noted in my last comment above, .add() doesn't work as documented, if two threads call it with partially overlapping set of objects.

Last edited 13 days ago by Tim Graham (previous) (diff)

comment:14 in reply to:  13 Changed 12 days ago by Jezeniel Zapanta

Replying to Дилян Палаузов:

Regarding the PR at https://github.com/django/django/pull/9314 , as noted in my last comment above, .add() doesn't work as documented, if two threads call it with partially overlapping set of objects.

What is the next step? Should I change the documentation that to handle concurrency problems you should handle IntegrityError? or wait for #28668 to be Implemented?

comment:15 Changed 2 days ago by Дилян Палаузов

I don't think that the ON CONFLICT DO NOTHING change will get into Django 2.0, even if #28668 was fixed yesterday.

Under these circumstances for the next year or so .add(one more elements) can raise an IntegrityError or send wrong m2m_changed signals and this should be documented.

Note: See TracTickets for help on using tickets.
Back to Top