#34771 closed Bug (invalid)
QuerySet.order_by() crashes on constants that cannot be cast on MySQL.
| Reported by: | Yitao Xiong | Owned by: | Jordan Bae |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 3.2 |
| Severity: | Normal | Keywords: | mysql |
| Cc: | Simon Charette | Triage Stage: | Unreviewed |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Although this is an extremely rare case, it does seem like to be something Django could've captured. Basically, when there's an annotated field, there's a slight difference on how the ORDER BY SQL is constructed based on whether the field is present or not in the SELECT statement, or in Django's world, whether the field is present in either values or values_list. Here's an example:
This would work fine:
>>> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id', 'random_stuff').order_by('random_stuff')
SELECT `auth_user`.`id`,
0 AS `random_stuff`
FROM `auth_user`
ORDER BY `random_stuff` ASC
LIMIT 21
Execution time: 0.000783s [Database: default]
<QuerySet [{'id': 1, 'random_stuff': False}, {'id': 2, 'random_stuff': False}, '...(remaining elements truncated)...']>
>>>
But this would break:
>>> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff')
None
Execution time: 0.000340s [Database: default]
Traceback (most recent call last):
File "<input>", line 1, in <module>
User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff')
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 256, in __repr__
data = list(self[:REPR_OUTPUT_SIZE + 1])
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 280, in __iter__
self._fetch_all()
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 1324, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 109, in __iter__
for row in compiler.results_iter(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size):
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1130, in results_iter
results = self.execute_sql(MULTI, chunked_fetch=chunked_fetch, chunk_size=chunk_size)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
cursor.execute(sql, params)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django_extensions/management/debug_cursor.py", line 50, in execute
return utils.CursorWrapper.execute(self, sql, params)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django_mysql/apps.py", line 75, in rewrite_hook
return execute(sql, params, many, context)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in _execute
with self.db.wrap_database_errors:
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
return self.cursor.execute(query, args)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/connections.py", line 254, in query
_mysql.connection.query(self, query)
django.db.utils.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1")
>>>
If you look at the SQL query, it breaks because it didn't have an alias to reference to in the ORDER BY statement, therefore it grabs the entire annotation expression and throw it in there:
>>> print(User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff').query)
SELECT `auth_user`.`id` FROM `auth_user` ORDER BY CAST(False AS bool) ASC
>>>
This is under MySQL 8.0.33 by the way. Not sure if it's just syntax not supported by MySQL.
Since Django doesn't seem to require an annotated field to be present in values or values_list to be used in order_by, my humble opinion is that it should be slightly smarter for this case by implicitly adding the field into the SELECT statement, and use its alias in the ORDER BY.
Thanks for taking a look, and feel free to let me know if you need more information.
Change History (13)
comment:1 by , 2 years ago
| Description: | modified (diff) |
|---|
comment:2 by , 2 years ago
| Description: | modified (diff) |
|---|
comment:3 by , 2 years ago
| Cc: | added |
|---|---|
| Keywords: | mysql added; SQL syntax order_by annotate removed |
comment:4 by , 2 years ago
Thanks for your reply Natalia!
I tried this with main and it behaved the same:
In [5]: User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff')
....
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1")
In [6]: print(User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff').query)
SELECT `auth_user`.`id` FROM `auth_user` ORDER BY CAST(False AS bool) ASC
In [7]: from django import VERSION
In [8]: VERSION
Out[8]: (5, 0, 0, 'alpha', 0)
In [9]:
For our specific scenario that caused this error, it wasn't really an intentional skipping on the annotated fields, it's more like this (it also wasn't the User model at all, using it to avoid exposing our business logics, apologize if that added confusions):
# Some generic queryset pre-assembling with annotations
base_queryset = User.objects.annotate(random_field...).order_by('random_field')
...
# Later in a specific path, needing some extra information from the base_queryset
additional_lookup = base_queryset.filter(...).values('id', 'email')
So it's more like the field was annotated first, and developer just didn't care about it when later reusing the queryset. Using my example, this query with the annotate and values changed orders will produce the same error:
User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).order_by('random_stuff').values('id')
...
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1")
To real-life use cases, I can see some cases when you want the rows returned in a specific order, but doesn't care about the value it's been ordered by.
A user-login log table can be an example - you want the last 10 logged in users, but you don't really care about when they logged in. If we specific are looking for order with annotations, maybe an API usage log table would be a good example? You wanna see the 10 users who used the API last so you annotate with their max access time and order by it, without needing the actual values.
comment:5 by , 2 years ago
Oh, an interesting discovery I just found when doing more experiments.
This doesn't seem to be an issue that applies to all fields. For instance, I just found out that the same pattern worked for IntegerField:
In [20]: User.objects.annotate(random_stuff=Value(1, output_field=IntegerField())).order_by('random_stuff').values('id')
Out[20]: <QuerySet [{'id': 1}, {'id': 2}, '...(remaining elements truncated)...']>
So I'm guess it's something specific with MySQL, given your example also showed the boolean query worked in Postgres.
comment:6 by , 2 years ago
Accepting since as per docs, it seems that the only valid datatypes for a CAST call are:
DATE Converts value to DATE. Format: "YYYY-MM-DD" DATETIME Converts value to DATETIME. Format: "YYYY-MM-DD HH:MM:SS" DECIMAL Converts value to DECIMAL. Use the optional M and D parameters to specify the maximum number of digits (M) and the number of digits following the decimal point (D). TIME Converts value to TIME. Format: "HH:MM:SS" CHAR Converts value to CHAR (a fixed length string) NCHAR Converts value to NCHAR (like CHAR, but produces a string with the national character set) SIGNED Converts value to SIGNED (a signed 64-bit integer) UNSIGNED Converts value to UNSIGNED (an unsigned 64-bit integer) BINARY Converts value to BINARY (a binary string)
Will write a regression test and bisect to confirm whether this would need a backport or not.
comment:7 by , 2 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:8 by , 2 years ago
| Summary: | order_by on annotated field that's not present in values/values_list causes SQL syntax error → CAST on MySQL failing for unsupported datatypes |
|---|
comment:10 by , 2 years ago
| Resolution: | → invalid |
|---|---|
| Status: | assigned → closed |
| Summary: | CAST on MySQL failing for unsupported datatypes → QuerySet.order_by() crashes on constants that cannot be cast on MySQL. |
| Triage Stage: | Accepted → Unreviewed |
CAST is required when ordering by constant value (see #26192, #31496), and ordering by constants is really niche in itself. As far as I'm aware there is nothing that Django could improve/fix here, this is a database limitation (and we cannot document all database caveats). You can order by e.g. an integer constant.
follow-up: 12 comment:11 by , 2 years ago
I guess it's my bad that I didn't clarify myself in the beginning - the ask was not really to support this case, but to make it clearer why it's broken so people who's less familiar with SQL knows how to move forward. Many of our developers have been working with Django for years, but this is our first time seeing such an error and we all scratched our head for quite a bit to come to the bottom of it. To me personally, a SQL syntax error would be the last thing I'd expect from an ORM.
I was thinking to have Django block unsupported CASTs in this case in the beginning. It'll be ideal that this query was never attempted to run in MySQL. You said we can't document all DB caveats, I imagine there might be too many? Not sure how feasible that idea is then... Another idea would be maybe capture this error and throw an ORM layer error instead?
comment:12 by , 2 years ago
Can we have Django block unsupported CASTs in this case?
This would force us to maintain list of unsupported transitions for each database.
You said we can't document all DB caveats, I imagine there might be too many?
Of course, database documentations are thousands of pages long, we don't want to duplicate them in Django docs.
Not sure how feasible that idea is then... Another idea would be maybe capture this error and throw an ORM layer error instead?
MySQL throws a completely generic error (1064 - error in SQL syntax) which could be anything, there is nothing that we can do with it.
Hello tinyx, thank you for your ticket.
My first thought when reading the description is that "it makes no sense" to order a query using a field that was not included in
valueslist, though I can't find docs that explicitly confirm my suspicion.I tested this in Postgresql (15.3) and obtained a successful query run:
>>> print(User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff').query) SELECT "auth_user"."id" FROM "auth_user" ORDER BY (False)::boolean ASC >>> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff') <QuerySet [{'id': 2}, {'id': 3}, {'id': 1}, {'id': 4}, {'id': 5}, {'id': 6}, {'id': 7}]> >>>Nevertheless, I would like to understand your use case a little bit more. I can imagine you defined
random_stuffpurposely silly to simplify the example, but would you have a more concrete/real use case that you could share with us?Also, is there any chance that you can test this same example in Django 4.2 (or even better,
main)? I don't have a handy mysql to test myself at this time.Lastly, I'm cc'ing Simon who may have more specific thoughts on the matter.