#35977 closed New feature (wontfix)
Serializing ManyToMany fields produces inconsistent order (on Postgres)
Reported by: | Alexander Todorov | Owned by: | |
---|---|---|---|
Component: | Core (Serialization) | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Context:
I am migrating data from one DB to another and using serialization to dump the contents into JSON and then compare them with diff. On occasion it happens that models which have a ManyToMany field result in a data dump where the collection of PKs isn't sorted in exactly the same order when comparing the 2 data dumps. This appears to happen quite often on Postgres and I believe it is coming from Postgres itself - i.e. when an explicit order isn't specified it sometimes returns the results in the best way it sees fit.
Proposal:
Add a call to .order_by("pk")
when handling m2m field serialization to make the results predictable.
Additional information
I have a commit in my own fork, https://github.com/atodorov/django/commit/1ae2f28ba42f28399f58d3edda98a35088225deb, which works great for me.
LMK if you want me to open a pull request.
Change History (3)
comment:1 by , 3 weeks ago
Has patch: | unset |
---|
comment:2 by , 3 weeks ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:3 by , 3 weeks ago
Agree with Sarah here!
The lack of ordering consistency when retrieving objects from the database without an explicit ordering is not specified is not a Postgres or serialization specific problem.
When no ORDERY BY
is specified the database can choose to return values in the order it wants which is usually the fastest one. In the case of database like MySQL that usually means by primary key as that's how the data is clustered/organized on disk while on Postgres it depends on multiple factors.
The patch you are proposing would be backward incompatible for two reasons
- It would change the order of serialized data for models that explicitly opted-in into a particular order
- The systematic order by primary key could slow down serialization of projects that don't care about ordering of data
In other words, database don't return data in a stable order unless you explicitly ask for it and Django provides a way to do so through through Meta.ordering
and manager overrides order_by
.
This would mean we are determining the order rather than leaving this to model managers (and custom model managers might be setting an explicit order). So, I'm not sure that's a desirable behavior change
You can raise this on the forum for discussion if you like: https://forum.djangoproject.com/c/internals/5