#28514 closed Cleanup/optimization (fixed)
Clarify docs regarding idempotence of RelatedManager.add()
Reported by: | Дилян Палаузов | Owned by: | Jezeniel Zapanta |
---|---|---|---|
Component: | Documentation | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | yes | UI/UX: | no |
Description (last modified by )
At https://docs.djangoproject.com/en/1.11/topics/db/examples/many_to_many/ it's written:
>> a2.publications.add(p3) > > Adding a second time is OK: > >> a2.publications.add(p3)
Please rephrase it to "Adding a second time is OK, it doesn't duplicate the relation".
At https://docs.djangoproject.com/en/1.11/ref/models/relations/ it's written:
add(*objs, bulk=True) Adds the specified model objects to the related object set.
Please amend: "Inserting existing relations this way does not cause problems."
The purpose of the changes is to make clear, that add()
is idempotent and it is not necessary first to read the data from the database, join with the new values and eventually write the resulting set back.
Change History (19)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|---|
Easy pickings: | set |
Summary: | Documentating idempotence of Many2ManyField.add() → Clarify docs regarding idempotence of RelatedManager.add() |
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 7 years ago
Has patch: | set |
---|
comment:4 by , 7 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I've reviewed the PR, it all looks good to me.
comment:5 by , 7 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
comment:6 by , 7 years ago
It turns out, that add() suffers from concurrency problem and is therefore not idempotent:
django/db/models/fields/related_descriptors.py:create_forward_many_to_many_manager.ManyRelatedManager.add() calls self.add(), which does
with transaction.atomic(using=db. savepoint=False): self._add_items(self.source_Field_name, self.target_field_name, *objs)
and _add_items does
with transaction.atomic(using=db, savepoint=False): signals.m2m_changes.send(...) self.through._default_manager.using(db).bulk_create([...])
Minor question: providing that add() is the only caller of _add_items, does the second transaction(savepoint=False) have added value?
Major question: bulk_create([]) can fail, if two threads simultaneously try to insert the same data, in which case the m2m_changed signal is sent, but bulk_create fails completely.
My proposal:
- First try bulk_insert, if it does not throw exception, send m2m_changed signal.
- Either update the documentation of .add() accordingly (if it throws IntegrityError, the caller shall retry the operation), or make .add() do the retries internally.
- Once bulk_create(... on_conflict='ignore') #28668 is implemented, revert the previous step, pass on_conflict='ignore' to bulk_create, and:
- In case of Postgresql retrieve information from bulk_create which objects were actually inserted, and send only for them m2m_changed
- For other databases, either send signal for all objects, even those which were in the database, don't use bulk_create or throw IntegrityError
comment:7 by , 7 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:10 by , 7 years ago
I could work on this one, this is my first time though. I saw the pull request, I can improve that with additional information about errors being raised or signals being sent.
Will work on this over the weekend.
comment:11 by , 7 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
follow-up: 14 comment:13 by , 7 years ago
Regarding the PR at https://github.com/django/django/pull/9314 , as noted in my last comment above, .add() doesn't work as documented, if two threads call it with partially overlapping set of objects.
comment:14 by , 7 years ago
Replying to Дилян Палаузов:
Regarding the PR at https://github.com/django/django/pull/9314 , as noted in my last comment above, .add() doesn't work as documented, if two threads call it with partially overlapping set of objects.
What is the next step? Should I change the documentation that to handle concurrency problems you should handle IntegrityError? or wait for #28668 to be Implemented?
comment:15 by , 7 years ago
I don't think that the ON CONFLICT DO NOTHING change will get into Django 2.0, even if #28668 was fixed yesterday.
Under these circumstances for the next year or so .add(one more elements)
can raise an IntegrityError
or send wrong m2m_changed signals and this should be documented.
comment:16 by , 7 years ago
Any progress here?
I think only the concurrent aspect of .add() is missing:
- if two .add() are called in parallel with overlapping set of related objects, one of them can fail with IntegrityError and has to be repeated. This is caused by the fact that Django inserts the objects within a transaction and only after the transaction is closed it can be determined if the changes are permament. Django does not retry in this case
- When IntegrityError is thrown, m2m_changed is still sent for the objects, that were supposed to be matched.
The latter is valid if I recall correctly the circumstances from five months ago. The code can certainly be changed so that m2m_changed is not sent, but it is better to document the current behaviour than doing noting.
Jack, I would like to encourage you to use the opportunity and check the implementation, while you describe it.
Apart from this my plan is to work with Django for one more month and I will be glad if by then all concerns related to this ticket can be clarified.
comment:17 by , 7 years ago
If RelatedManager.add()
isn't idempotent, then that sounds like a bug. We prefer to spend time fixing bugs rather than documenting them, so that's the approach I'd suggest.
comment:18 by , 7 years ago
This approach is nice, but status quo is misleading the ones who try to use .add()
.
If currently nobody has the time to fix the bug, then it is more likely that somebody will have the time to document the behaviour, as this requires less time.
comment:19 by , 7 years ago
Undoubtedly https://github.com/django/django/pull/9314 shall be integrated, it fixes the problem this ticket initially raised and is independent of other problems .add() has.
Unfortunately on the above link is shown some tests have failed, but I cannot see now why.
The concurrent mystery is tracked under #19544.
Made suggested changes: https://github.com/django/django/pull/8953