Opened 16 months ago
Closed 16 months ago
#34697 closed Cleanup/optimization (fixed)
Migration serializer for sets results in non-deterministic order.
Reported by: | Yury V. Zaytsev | Owned by: | nobody |
---|---|---|---|
Component: | Migrations | Version: | 4.2 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
We are using quite a lot of complex index_together
/ unique_together
constraints on our models, and the output in the generated migrations is flip-flopping all the time like follows, causing spurious diffs in our checkouts:
migrations.AlterUniqueTogether( + unique_together={("tenant", "dealer"), ("tenant", "order")}, - unique_together={("tenant", "order"), ("tenant", "dealer")},
This is happening because these constraints are normalized to sets internally in the ModelState, which kind of makes sense, but unfortunately set iteration order (unlike dicts!) is unstable in Python 3 due to hash randomization. However, migrations serializer doesn't have any special facilities for ensuring stable output for sets and this is what causes annoying diffs for us all the time.
I suggest to add a trivial serializer specifically for unordered sequences which ensures stable output no matter the iteration order. Stability can be achieved by sorting elements in the set by their string representation. This only affects the writer output, and doesn't interfere with the rest of Django in any way, so this change only improves developer experience, but has no effect on the performance and/or reliability.
I hope that even though it's apparently not a major problem for most users you would still accept the fix to ensure stable migration writer output for the rest of us.
Attachments (2)
Change History (11)
by , 16 months ago
Attachment: | 0001-Fixed-34697-Added-serializer-for-unordered-sequences.patch added |
---|
comment:1 by , 16 months ago
Cc: | added |
---|---|
Has patch: | unset |
by , 16 months ago
Attachment: | 0002-Fixed-30029-Sort-dependencies-in-MigrationWriter-to-.patch added |
---|
comment:2 by , 16 months ago
Hi, thanks for the feedback and the reference to an earlier ticket.
Actually, we've also hit the flip-flops with the dependencies, which are annoying, but we do hit them so rarely, that I actually have forgotten about them. However, we are hitting the set-caused issues every couple of days and this finally motioned me to try to do something about it. I have salvaged the old part of the patch concerning the dependencies, rebased it, fixed and it seems to work. Very nice. The result is uploaded to the ticket.
Yes, I know about the "workaround" of setting the PYTHONHASHSEED
, but this is not a practical solution for us. Maybe you can do that if you are a sole developer working alone on some Django project. You can write a wrapper script around manage.py
, or set this variable globally, or use the IDE configuration. But if you are a team of more than a dozen of developers working on a large number of Django projects collaboratively, things are getting very annoying and difficult.
We can't easily patch manage.py
to set it for makemigrations
, because it has to be set before interpreter startup. Wrapper scripts are also problematic, because each member would have to integrate it in his workflow. Finally, forcing people to set PYTHONHASHSEED
globally on their machine is a questionable solution with security implications, which is why it was introduced in the first place...
Making migrations completely deterministic is easy ;-) You just have to sort in BaseSequenceSerializer
for all sequence types, and not just for set types like I did. Only this will of course have a huge number of unwanted side-effects like force-sorted choices and so on, and it will cause hugely complicated and unnecessary code mess to write opt-outs explicitly.
Right now, all concerns raised by Simon in the past are addressed:
- Sets are (now) globally deterministic
- Dicts are serialized in an fixed insertion order
- Dependencies are now sorted explicitly as an exception*
- Other stuff (lists, tuples) are ordered, because the order has a semantic
* Or maybe a better approach would be to switch dependencies to a set if the order is not important, how would you like that?
So I think that fixing it for set types and dependencies will solve 99% of the cases for everyone without any bad side effects and no effort on the part of the users, and will not cause much maintenance & code complexity burden. For now you've got 2 tickets in 5 years for this, and if these patches are committed and we still missed a thing, then it's no problem to fix another edge case 10 years later, in as far as I'm concerned... :-)
I would be curious to hear Simon's thoughts on that.
comment:3 by , 16 months ago
Thanks for your linking the issue Mariusz and for the taking the read through it and craft a detailed answer Yury.
TL;DR I think your assessment of the current situation makes sense and I'm starting to believe we should proceed with this patch.
I'm curious of where you experience the flip-flop though. Are you generating the same migrations over and over again? What development process causing the diffing noise you are referring to given makemigrations
operates on the equality of objects and should not care about how they are serialized as long as they are equal.
Or maybe a better approach would be to switch dependencies to a set if the order is not important, how would you like that?
I think this is something that should be considered in another ticket but that we could ultimately do. The fact they are currently stored in a list today provides a false sense that their ordering is meaningful while it's not actually the case. That's a problem that is generalized to many parts of Django unfortunately (e.g. Model.Meta.unique_together
at the model definition level in another example) so I'm not sure it's actually worth the effort in fixing.
comment:4 by , 16 months ago
Summary: | Migration serializer for sets results in indeterministic output due unstable iteration order / hash randomization → Migration serializer for sets results in non-deterministic order. |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Bug → Cleanup/optimization |
Tentatively accepted.
comment:5 by , 16 months ago
Thank you for your thoughts, Simon!
I'm curious of where you experience the flip-flop though. Are you generating the same migrations over and over again?
Yes, your assessment is correct. We are only having these problems in the projects, in which we are using Django as a static site or configuration generator. In these projects we basically have the databases in YAML fixtures with changes workflow controlled by GitHub pull requests.
If we change the code of the generator itself (and specifically Django models), then we re-create the migrations completely, because keeping migration history just doesn't make sense. There is no "real" database to migrate in the first place, they just bloat the repository and slow the generators down... It's working quite nicely and the diffs are very readable, if it weren't for annoying flip-flop hunks, which one always has to remember to discard from the checkout before committing the changes.
I guess not many people use Django like this, but it seems that we are not completely alone, and by the way... did I already mention that Django is awesome?
I think this is something that should be considered in another ticket but that we could ultimately do. The fact they are currently stored in a list today provides a false sense that their ordering is meaningful while it's not actually the case.
I've had a look at the code after I voiced the idea, and I agree with you, that if this is to be done, then better in a different ticket. Unfortunately, this part of the class is templated in a different fashion than all the rest, and just turning dependencies into a set and hoping that my new serializer with take care of that won't fly. It seems that it's actually quite some work to do it properly... but I agree, that generally it's desirable, because I like your point of having the types to convey the semantic.
If there is anything else I can do that would be helpful to you according to Django process to get this in, please feel free to ping me.
comment:7 by , 16 months ago
Has patch: | set |
---|
comment:8 by , 16 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
Thanks for the ticket, however I'm not sure it's worth changing as other elements may still be non-deterministic for a different set, see #30029. Setting a fixed
PYTHONHASHSEED
should make it deterministic for you.