Opened 4 months ago
Closed 11 days ago
#36750 closed Bug (fixed)
dumpdata's output of m2m values is not deterministic
| Reported by: | Jacob Walls | Owned by: | VIZZARD-X |
|---|---|---|---|
| Component: | Core (Serialization) | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | Simon Charette | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Recent failure of main-random CI job reveals that dumpdata's output of m2m values is not deterministic:
./runtests.py fixtures --shuffle 7825542710 --settings=test_postgres -k forward -v2
test_forward_reference_fk (fixtures.tests.ForwardReferenceTests.test_forward_reference_fk) ... ok
test_forward_reference_m2m (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m) ... ok
test_forward_reference_fk_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_fk_natural_key) ... ok
test_forward_reference_m2m_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m_natural_key) ... FAIL
======================================================================
FAIL: test_forward_reference_m2m_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m_natural_key)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/jwalls/django/tests/fixtures/tests.py", line 1308, in test_forward_reference_m2m_natural_key
self._dumpdata_assert(
~~~~~~~~~~~~~~~~~~~~~^
["fixtures"],
^^^^^^^^^^^^^
...<8 lines>...
natural_foreign_keys=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/jwalls/django/tests/fixtures/tests.py", line 127, in _dumpdata_assert
self.assertJSONEqual(command_output, output)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Lists differ: [{'mo[94 chars] [['t3'], ['t2']]}}, {'model': 'fixtures.natur[179 chars][]}}] != [{'mo[94 chars] [['t2'], ['t3']]}}, {'model': 'fixtures.natur[179 chars][]}}]
First differing element 0:
{'mod[49 chars]: 't1', 'other_thing': None, 'other_things': [['t3'], ['t2']]}}
{'mod[49 chars]: 't1', 'other_thing': None, 'other_things': [['t2'], ['t3']]}}
[{'fields': {'key': 't1',
'other_thing': None,
- 'other_things': [['t3'], ['t2']]},
? --------
+ 'other_things': [['t2'], ['t3']]},
? ++++++++
'model': 'fixtures.naturalkeything'},
{'fields': {'key': 't2', 'other_thing': None, 'other_things': []},
'model': 'fixtures.naturalkeything'},
{'fields': {'key': 't3', 'other_thing': None, 'other_things': []},
'model': 'fixtures.naturalkeything'}]
----------------------------------------------------------------------
Ran 4 tests in 0.071s
FAILED (failures=1)
Used shuffle seed: 7825542710 (given)
In the spirit of #24558 and #10381, I'm suggesting to make this deterministic instead of fiddling with the test.
A naive diff just adding ordering by "pk" fixes the failure ...:
diff --git a/django/core/serializers/python.py b/django/core/serializers/python.py index 2929874b01..496d3a5ba6 100644 --- a/django/core/serializers/python.py +++ b/django/core/serializers/python.py @@ -74,7 +74,7 @@ class Serializer(base.Serializer): return value.natural_key() def queryset_iterator(obj, field): - attr = getattr(obj, field.name) + attr = getattr(obj, field.name).order_by("pk") # !!! not right chunk_size = ( 2000 if getattr(attr, "prefetch_cache_name", None) else None )
... but to be mergeable, the patch would need to:
- get the default ordering from somewhere better (through model? target model?), instead of hardcoding "pk"
- check for all locations in the python and xml serializers that might need this change
Change History (30)
comment:1 by , 4 months ago
comment:2 by , 4 months ago
| Triage Stage: | Unreviewed → Accepted |
|---|
@Jacob
get the default ordering from somewhere better (through model? target model?), instead of hardcoding "pk"
Since the objects will be filtered by from_model_id = ? I think that ordering by to_model_id (AKA target model) is a good candidate.
@VIZZARD-X
Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.
It might be finicky to reproduce as it entirely depends on the order the database decides to returns objects which is backend dependent.
python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_sqlite -k forward -v2
The report explicitly state that the issue was reproduced against Postgres (see --settings=test_postgres) and you tried reproducing agaisnt SQLite so this is highly likely why you can't reproduce. In most cases SQLite will objects in their order of creation while in the case of Postgres more factors come into play.
Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.
Which recent changes are you referring to? The problem was observed less than a day ago (see ticket creation date) and no change to serialization ordering landed on main since then.
comment:3 by , 4 months ago
Hi, I’d like to work on this issue and submit a patch. I’m still learning Django core, but I want to try reproducing the test failure and understanding the m2m serialization behavior.
comment:4 by , 4 months ago
Thanks, Simon — that makes sense.
I realized my earlier attempt used tests.test_sqlite, which explains why the issue didn’t reproduce.
Running the same test suite against Postgres: python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_postgres -k forward -v2
I can now reproduce the nondeterministic ordering exactly as reported by @Jacob.
I’ll work on a patch that:
- orders the M2M related manager by the target model’s primary key (as you suggested),
- updates both Python + XML serializers, and
- adds a test for deterministic natural key ordering.
I’ll submit a PR shortly.
comment:5 by , 4 months ago
orders the M2M related manager by the target model’s primary key (as you suggested),
That's not what I suggested though, it should be the local field field (from the many-to-many model's perspective) pointing at the target / remote model.
comment:6 by , 4 months ago
Updated patch and opened a new PR after refreshing my fork:
https://github.com/django/django/pull/20308
This version applies database-level primary key ordering for natural-key
many-to-many relations, per the guidance from Jacob and Simon. It avoids
Python-side sorting and only applies DB ordering when:
- natural foreign keys are enabled,
- the related model defines natural_key(), and
- the model has no explicit Meta.ordering.
All serializer tests, including PostgreSQL with --shuffle, now pass
consistently. The change is limited to Python and XML serializers.
comment:7 by , 4 months ago
| Has patch: | set |
|---|
comment:8 by , 3 months ago
| Patch needs improvement: | set |
|---|
Will look after the merge conflicts are fixed.
comment:9 by , 3 months ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
can i work on this issue as it was last updates 2 weeks ago
comment:10 by , 3 months ago
No, 2 weeks is not much time for volunteers. There's an update in the PR saying to please have some patience for updates.
comment:11 by , 3 months ago
| Owner: | changed from to |
|---|
comment:13 by , 2 months ago
| Needs tests: | set |
|---|---|
| Patch needs improvement: | unset |
comment:14 by , 2 months ago
| Needs tests: | unset |
|---|
comment:15 by , 2 months ago
| Patch needs improvement: | set |
|---|
comment:16 by , 2 months ago
| Triage Stage: | Accepted → Someday/Maybe |
|---|
Makes sense to hold this work until we have a totally_ordered property on QuerySet, see suggestion.
comment:17 by , 3 weeks ago
Hi Jacob,
Now that #36857 has been merged, the blocker for this ticket is resolved. I have updated the PR to utilize the new property as discussed.
Could we please move the Triage Stage back to Accepted?
Thanks!
comment:18 by , 3 weeks ago
| Patch needs improvement: | unset |
|---|
comment:19 by , 3 weeks ago
| Triage Stage: | Someday/Maybe → Accepted |
|---|
comment:20 by , 3 weeks ago
| Patch needs improvement: | set |
|---|
comment:21 by , 2 weeks ago
| Patch needs improvement: | unset |
|---|
comment:22 by , 13 days ago
| Patch needs improvement: | set |
|---|
comment:23 by , 13 days ago
| Patch needs improvement: | unset |
|---|
comment:24 by , 13 days ago
| Patch needs improvement: | set |
|---|
comment:25 by , 12 days ago
| Needs tests: | set |
|---|
Simon, can I ask you to flesh out comment:5? I'm not sure what ordering you're suggesting and whether it would still be totally ordered.
comment:26 by , 12 days ago
| Cc: | added |
|---|
comment:27 by , 12 days ago
Jacob, ordering by the primary key ensures the ordering is total so that should do if it makes things simpler.
My comment:5 was just to point out to comment:4 that in comment:3 I suggested using the field referenced by the many-to-many field which is not necessarily the remote primary key.
The referenced field has to be unique as well and felt like a more natural default than necessarily picking the primary key but given this ticket main focus is making sure the output is deterministic defaulting to the primary key is a good choice particularly given the detour we took with sql.Query.totally_ordered (thank you for tha btw).
comment:28 by , 12 days ago
| Needs tests: | unset |
|---|---|
| Patch needs improvement: | unset |
comment:29 by , 11 days ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
I tried to reproduce this issue on the current main branch using the same shuffle seed and parameters from the report:
python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_sqlite -k forward -v2
All four forward_reference tests pass, including:
test_forward_reference_m2m_natural_key
Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.
Could the ticket be reviewed and possibly closed if others also cannot reproduce it?