Opened 6 years ago

Closed 6 years ago

#29215 closed Cleanup/optimization (invalid)

Document potential change to behaviour of QuerySet methods when upgrading to Python 3.6

Reported by: Matt Fisher Owned by: nobody
Component: Documentation Version: 2.0
Severity: Normal Keywords: documentation
Cc: Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Python 3.6 introduces a change that preserves the ordering of keyword arguments - PEP 468: Preserving Keyword Argument Order

Since this is not a difference between Python 2 and 3 per se, it is not mentioned in the “Porting to Python 3” Django 1.11 topic page.

Preserving keyword argument ordering can change the SQL produced by QuerySet methods that accept **kwargs, specifically .get, .filter, and .exclude, and potentially also .values, .annotate, .aggregate, .create, .get_or_create, .update, and .update_or_create.

As an example,
.filter(b__in=some_qs, a=1)
could produce
... WHERE (a=1 AND b IN (SELECT …)) in Python < 3.6 and
... WHERE (b IN (SELECT …) AND a=1) in Python >= 3.6.
The query is semantically equivalent but the postgres query planner can produce dramatically different query plans and resultant performance if the query has a significant number of joins.

We had one particular frequently-run query that ran in 2-3 seconds in Python 2, and 80+ seconds in Python 3.6. The problem was only apparent in production, and presented as severely degraded responsiveness from the postgres RDS database. The additional load on the database made most other queries take longer as well, which made it difficult to track down the cause as it was not obvious that any queries could have changed. To compound the problem, the different query execution caused many more temp block writes on the RDS instance, which burned through our IOPS burst balance and further degraded site performance. Because the query didn’t change semantically, all our automated tests passed and we didn’t see the issue until it was in production under full load. Googling the problem produced no good results, which suggests it is not common, but in need of a useful resource.

People may be affected by this change when upgrading from Python 2.7 to Python 3.6+ using Django 1.x, in which case it would be helpful to have a warning in the 1.11 porting topic, but it will also occur when people are upgrading from Python 3.[0-5] to 3.6+ while potentially using Django 2, in which case the porting topic is no longer in the current docs. Not sure what would be an appropriate place for a warning in this case.

Change History (3)

comment:1 by Simon Charette, 6 years ago

Cc: Simon Charette added

This behavior has little to do with Python 3.6 itself.

If you had query performing better on Python 2.7 than on Python 3 it's probably because you weren't using a random hash seed which is the default on Python 3. In other words, you were relying on a stable hash seed to build queries in a way that PostgreSQL query planner's would correctly optimize. If you had updated from 2.7 to Python 3.5 instead you would have hit the slow query issue from time to time when the interpreter chose an hash seed that made list({'b__in', 'a'}) == ['b__in', 'a']. In some way Python 3.6 allows you to reliably build your WHERE statement through kwargs ordering while it wasn't possible before.

I guess we could add a section about the fact Python 3 has random hash seeding on by default and that may break code making assumptions about unordered data structures ordering. Maybe we should also suggest running Python 2.7 with the -R flag on when porting code?

comment:2 by Tim Graham, 6 years ago

I think the documentation suggests that .filter(b__in=some_qs, a=1) is equivalent to .filter(a=1, b__in=some_qs). I'm not sure if we should document the subtleties you brought up about the actual SQL that's produced. I doubt there are any tests for this "feature." If we don't document it, then documenting a possible change regarding it doesn't make much sense. Well, it's now informally documented with this ticket.

By the way, we're no longer updating the Django 1.11 documentation, which includes the "porting to Python 3" document. The Django 1.11 documentation recommends using python -R for a random hash seed, when using Python 2.

comment:3 by Carlton Gibson, 6 years ago

Resolution: invalid
Status: newclosed

There's not much we can realistically do here.

I'd say it would be border-line for whether to include it in the 1.11 porting topic page, but since that's no longer maintained that question is moot.

I feel the issue itself out of scope for Django: we're generating correct SQL, which is meant to be declarative, i.e. independent of how the RDMS implements the query.
This falls with any number of backend specific quirks that we just can't get into. (It's interesting what the Postgres folks would say about it.)

Note: See TracTickets for help on using tickets.
Back to Top