Opened 3 weeks ago
Last modified 8 days ago
#36750 new Bug
dumpdata's output of m2m values is not deterministic
| Reported by: | Jacob Walls | Owned by: | |
|---|---|---|---|
| Component: | Core (Serialization) | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | yes |
| Easy pickings: | no | UI/UX: | no |
Description
Recent failure of main-random CI job reveals that dumpdata's output of m2m values is not deterministic:
./runtests.py fixtures --shuffle 7825542710 --settings=test_postgres -k forward -v2
test_forward_reference_fk (fixtures.tests.ForwardReferenceTests.test_forward_reference_fk) ... ok
test_forward_reference_m2m (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m) ... ok
test_forward_reference_fk_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_fk_natural_key) ... ok
test_forward_reference_m2m_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m_natural_key) ... FAIL
======================================================================
FAIL: test_forward_reference_m2m_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m_natural_key)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/jwalls/django/tests/fixtures/tests.py", line 1308, in test_forward_reference_m2m_natural_key
self._dumpdata_assert(
~~~~~~~~~~~~~~~~~~~~~^
["fixtures"],
^^^^^^^^^^^^^
...<8 lines>...
natural_foreign_keys=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/jwalls/django/tests/fixtures/tests.py", line 127, in _dumpdata_assert
self.assertJSONEqual(command_output, output)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Lists differ: [{'mo[94 chars] [['t3'], ['t2']]}}, {'model': 'fixtures.natur[179 chars][]}}] != [{'mo[94 chars] [['t2'], ['t3']]}}, {'model': 'fixtures.natur[179 chars][]}}]
First differing element 0:
{'mod[49 chars]: 't1', 'other_thing': None, 'other_things': [['t3'], ['t2']]}}
{'mod[49 chars]: 't1', 'other_thing': None, 'other_things': [['t2'], ['t3']]}}
[{'fields': {'key': 't1',
'other_thing': None,
- 'other_things': [['t3'], ['t2']]},
? --------
+ 'other_things': [['t2'], ['t3']]},
? ++++++++
'model': 'fixtures.naturalkeything'},
{'fields': {'key': 't2', 'other_thing': None, 'other_things': []},
'model': 'fixtures.naturalkeything'},
{'fields': {'key': 't3', 'other_thing': None, 'other_things': []},
'model': 'fixtures.naturalkeything'}]
----------------------------------------------------------------------
Ran 4 tests in 0.071s
FAILED (failures=1)
Used shuffle seed: 7825542710 (given)
In the spirit of #24558 and #10381, I'm suggesting to make this deterministic instead of fiddling with the test.
A naive diff just adding ordering by "pk" fixes the failure ...:
diff --git a/django/core/serializers/python.py b/django/core/serializers/python.py index 2929874b01..496d3a5ba6 100644 --- a/django/core/serializers/python.py +++ b/django/core/serializers/python.py @@ -74,7 +74,7 @@ class Serializer(base.Serializer): return value.natural_key() def queryset_iterator(obj, field): - attr = getattr(obj, field.name) + attr = getattr(obj, field.name).order_by("pk") # !!! not right chunk_size = ( 2000 if getattr(attr, "prefetch_cache_name", None) else None )
... but to be mergeable, the patch would need to:
- get the default ordering from somewhere better (through model? target model?), instead of hardcoding "pk"
- check for all locations in the python and xml serializers that might need this change
Change History (8)
comment:1 by , 3 weeks ago
comment:2 by , 3 weeks ago
| Triage Stage: | Unreviewed → Accepted |
|---|
@Jacob
get the default ordering from somewhere better (through model? target model?), instead of hardcoding "pk"
Since the objects will be filtered by from_model_id = ? I think that ordering by to_model_id (AKA target model) is a good candidate.
@VIZZARD-X
Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.
It might be finicky to reproduce as it entirely depends on the order the database decides to returns objects which is backend dependent.
python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_sqlite -k forward -v2
The report explicitly state that the issue was reproduced against Postgres (see --settings=test_postgres) and you tried reproducing agaisnt SQLite so this is highly likely why you can't reproduce. In most cases SQLite will objects in their order of creation while in the case of Postgres more factors come into play.
Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.
Which recent changes are you referring to? The problem was observed less than a day ago (see ticket creation date) and no change to serialization ordering landed on main since then.
comment:3 by , 3 weeks ago
Hi, I’d like to work on this issue and submit a patch. I’m still learning Django core, but I want to try reproducing the test failure and understanding the m2m serialization behavior.
comment:4 by , 3 weeks ago
Thanks, Simon — that makes sense.
I realized my earlier attempt used tests.test_sqlite, which explains why the issue didn’t reproduce.
Running the same test suite against Postgres: python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_postgres -k forward -v2
I can now reproduce the nondeterministic ordering exactly as reported by @Jacob.
I’ll work on a patch that:
- orders the M2M related manager by the target model’s primary key (as you suggested),
- updates both Python + XML serializers, and
- adds a test for deterministic natural key ordering.
I’ll submit a PR shortly.
comment:5 by , 3 weeks ago
orders the M2M related manager by the target model’s primary key (as you suggested),
That's not what I suggested though, it should be the local field field (from the many-to-many model's perspective) pointing at the target / remote model.
comment:6 by , 3 weeks ago
Updated patch and opened a new PR after refreshing my fork:
https://github.com/django/django/pull/20308
This version applies database-level primary key ordering for natural-key
many-to-many relations, per the guidance from Jacob and Simon. It avoids
Python-side sorting and only applies DB ordering when:
- natural foreign keys are enabled,
- the related model defines natural_key(), and
- the model has no explicit Meta.ordering.
All serializer tests, including PostgreSQL with --shuffle, now pass
consistently. The change is limited to Python and XML serializers.
comment:7 by , 3 weeks ago
| Has patch: | set |
|---|
comment:8 by , 8 days ago
| Patch needs improvement: | set |
|---|
Will look after the merge conflicts are fixed.
I tried to reproduce this issue on the current main branch using the same shuffle seed and parameters from the report:
python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_sqlite -k forward -v2
All four forward_reference tests pass, including:
test_forward_reference_m2m_natural_key
Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.
Could the ticket be reviewed and possibly closed if others also cannot reproduce it?