Opened 4 months ago

Closed 11 days ago

#36750 closed Bug (fixed)

dumpdata's output of m2m values is not deterministic

Reported by: Jacob Walls Owned by: VIZZARD-X
Component: Core (Serialization) Version: dev
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Recent failure of main-random CI job reveals that dumpdata's output of m2m values is not deterministic:

./runtests.py fixtures --shuffle 7825542710 --settings=test_postgres -k forward -v2
test_forward_reference_fk (fixtures.tests.ForwardReferenceTests.test_forward_reference_fk) ... ok
test_forward_reference_m2m (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m) ... ok
test_forward_reference_fk_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_fk_natural_key) ... ok
test_forward_reference_m2m_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m_natural_key) ... FAIL

======================================================================
FAIL: test_forward_reference_m2m_natural_key (fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m_natural_key)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jwalls/django/tests/fixtures/tests.py", line 1308, in test_forward_reference_m2m_natural_key
    self._dumpdata_assert(
    ~~~~~~~~~~~~~~~~~~~~~^
        ["fixtures"],
        ^^^^^^^^^^^^^
    ...<8 lines>...
        natural_foreign_keys=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jwalls/django/tests/fixtures/tests.py", line 127, in _dumpdata_assert
    self.assertJSONEqual(command_output, output)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Lists differ: [{'mo[94 chars] [['t3'], ['t2']]}}, {'model': 'fixtures.natur[179 chars][]}}] != [{'mo[94 chars] [['t2'], ['t3']]}}, {'model': 'fixtures.natur[179 chars][]}}]

First differing element 0:
{'mod[49 chars]: 't1', 'other_thing': None, 'other_things': [['t3'], ['t2']]}}
{'mod[49 chars]: 't1', 'other_thing': None, 'other_things': [['t2'], ['t3']]}}

  [{'fields': {'key': 't1',
               'other_thing': None,
-              'other_things': [['t3'], ['t2']]},
?                                     --------

+              'other_things': [['t2'], ['t3']]},
?                               ++++++++

    'model': 'fixtures.naturalkeything'},
   {'fields': {'key': 't2', 'other_thing': None, 'other_things': []},
    'model': 'fixtures.naturalkeything'},
   {'fields': {'key': 't3', 'other_thing': None, 'other_things': []},
    'model': 'fixtures.naturalkeything'}]

----------------------------------------------------------------------
Ran 4 tests in 0.071s

FAILED (failures=1)
Used shuffle seed: 7825542710 (given)

In the spirit of #24558 and #10381, I'm suggesting to make this deterministic instead of fiddling with the test.

A naive diff just adding ordering by "pk" fixes the failure ...:

diff --git a/django/core/serializers/python.py b/django/core/serializers/python.py
index 2929874b01..496d3a5ba6 100644
--- a/django/core/serializers/python.py
+++ b/django/core/serializers/python.py
@@ -74,7 +74,7 @@ class Serializer(base.Serializer):
                     return value.natural_key()
 
                 def queryset_iterator(obj, field):
-                    attr = getattr(obj, field.name)
+                    attr = getattr(obj, field.name).order_by("pk")  # !!! not right
                     chunk_size = (
                         2000 if getattr(attr, "prefetch_cache_name", None) else None
                     )

... but to be mergeable, the patch would need to:

  • get the default ordering from somewhere better (through model? target model?), instead of hardcoding "pk"
  • check for all locations in the python and xml serializers that might need this change

Change History (30)

comment:1 by VIZZARD-X, 4 months ago

I tried to reproduce this issue on the current main branch using the same shuffle seed and parameters from the report:
python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_sqlite -k forward -v2

All four forward_reference tests pass, including:
test_forward_reference_m2m_natural_key

Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.

Could the ticket be reviewed and possibly closed if others also cannot reproduce it?

comment:2 by Simon Charette, 4 months ago

Triage Stage: UnreviewedAccepted

@Jacob

get the default ordering from somewhere better (through model? target model?), instead of hardcoding "pk"

Since the objects will be filtered by from_model_id = ? I think that ordering by to_model_id (AKA target model) is a good candidate.

@VIZZARD-X

Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.

It might be finicky to reproduce as it entirely depends on the order the database decides to returns objects which is backend dependent.

python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_sqlite -k forward -v2

The report explicitly state that the issue was reproduced against Postgres (see --settings=test_postgres) and you tried reproducing agaisnt SQLite so this is highly likely why you can't reproduce. In most cases SQLite will objects in their order of creation while in the case of Postgres more factors come into play.

Given this, I’m unable to reproduce the nondeterministic M2M natural key ordering on the latest code. It appears the issue may already have been resolved by recent changes in the serializers or related refactors.

Which recent changes are you referring to? The problem was observed less than a day ago (see ticket creation date) and no change to serialization ordering landed on main since then.

comment:3 by Ahmed Asar, 4 months ago

Hi, I’d like to work on this issue and submit a patch. I’m still learning Django core, but I want to try reproducing the test failure and understanding the m2m serialization behavior.

comment:4 by VIZZARD-X, 4 months ago

Thanks, Simon — that makes sense.

I realized my earlier attempt used tests.test_sqlite, which explains why the issue didn’t reproduce.
Running the same test suite against Postgres: python tests/runtests.py fixtures --shuffle 7825542710 --settings=tests.test_postgres -k forward -v2

I can now reproduce the nondeterministic ordering exactly as reported by @Jacob.

I’ll work on a patch that:

  • orders the M2M related manager by the target model’s primary key (as you suggested),
  • updates both Python + XML serializers, and
  • adds a test for deterministic natural key ordering.

I’ll submit a PR shortly.

comment:5 by Simon Charette, 4 months ago

orders the M2M related manager by the target model’s primary key (as you suggested),

That's not what I suggested though, it should be the local field field (from the many-to-many model's perspective) pointing at the target / remote model.

comment:6 by VIZZARD-X, 4 months ago

Updated patch and opened a new PR after refreshing my fork:

https://github.com/django/django/pull/20308

This version applies database-level primary key ordering for natural-key
many-to-many relations, per the guidance from Jacob and Simon. It avoids
Python-side sorting and only applies DB ordering when:

  • natural foreign keys are enabled,
  • the related model defines natural_key(), and
  • the model has no explicit Meta.ordering.

All serializer tests, including PostgreSQL with --shuffle, now pass
consistently. The change is limited to Python and XML serializers.

comment:7 by VIZZARD-X, 4 months ago

Has patch: set

comment:8 by Jacob Walls, 3 months ago

Patch needs improvement: set

Will look after the merge conflicts are fixed.

comment:9 by Kundan Yadav, 3 months ago

Owner: set to Kundan Yadav
Status: newassigned

can i work on this issue as it was last updates 2 weeks ago

comment:10 by Jacob Walls, 3 months ago

No, 2 weeks is not much time for volunteers. There's an update in the PR saying to please have some patience for updates.

comment:11 by Jacob Walls, 3 months ago

Owner: changed from Kundan Yadav to VIZZARD-X

comment:12 by Jacob Walls, 2 months ago

Many unrelated changes still mixed in with the diff.

comment:13 by Jacob Walls, 2 months ago

Needs tests: set
Patch needs improvement: unset

comment:14 by Jacob Walls, 2 months ago

Needs tests: unset

comment:15 by Jacob Walls, 2 months ago

Patch needs improvement: set

comment:16 by Jacob Walls, 2 months ago

Triage Stage: AcceptedSomeday/Maybe

Makes sense to hold this work until we have a totally_ordered property on QuerySet, see suggestion.

comment:17 by VIZZARD-X, 3 weeks ago

Hi Jacob,

Now that #36857 has been merged, the blocker for this ticket is resolved. I have updated the PR to utilize the new property as discussed.

Could we please move the Triage Stage back to Accepted?

Thanks!

comment:18 by VIZZARD-X, 3 weeks ago

Patch needs improvement: unset

comment:19 by Jacob Walls, 3 weeks ago

Triage Stage: Someday/MaybeAccepted

comment:20 by Jacob Walls, 3 weeks ago

Patch needs improvement: set

comment:21 by VIZZARD-X, 2 weeks ago

Patch needs improvement: unset

comment:22 by Jacob Walls, 13 days ago

Patch needs improvement: set

comment:23 by VIZZARD-X, 13 days ago

Patch needs improvement: unset

comment:24 by Jacob Walls, 13 days ago

Patch needs improvement: set

comment:25 by Jacob Walls, 12 days ago

Needs tests: set

Simon, can I ask you to flesh out comment:5? I'm not sure what ordering you're suggesting and whether it would still be totally ordered.

comment:26 by Simon Charette, 12 days ago

Cc: Simon Charette added

comment:27 by Simon Charette, 12 days ago

Jacob, ordering by the primary key ensures the ordering is total so that should do if it makes things simpler.

My comment:5 was just to point out to comment:4 that in comment:3 I suggested using the field referenced by the many-to-many field which is not necessarily the remote primary key.

The referenced field has to be unique as well and felt like a more natural default than necessarily picking the primary key but given this ticket main focus is making sure the output is deterministic defaulting to the primary key is a good choice particularly given the detour we took with sql.Query.totally_ordered (thank you for tha btw).

Last edited 12 days ago by Simon Charette (previous) (diff)

comment:28 by VIZZARD-X, 12 days ago

Needs tests: unset
Patch needs improvement: unset

comment:29 by Jacob Walls, 11 days ago

Triage Stage: AcceptedReady for checkin

comment:30 by Jacob Walls <jacobtylerwalls@…>, 11 days ago

Resolution: fixed
Status: assignedclosed

In e6108b7:

Fixed #36750 -- Made ordering of M2M objects deterministic in serializers.

Co-authored-by: Simon Charette <charette.s@…>
Co-authored-by: Jacob Walls <jacobtylerwalls@…>

Note: See TracTickets for help on using tickets.
Back to Top