#34771 closed Bug (invalid)
QuerySet.order_by() crashes on constants that cannot be cast on MySQL.
Reported by: | Yitao Xiong | Owned by: | Jordan Bae |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | mysql |
Cc: | Simon Charette | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Although this is an extremely rare case, it does seem like to be something Django could've captured. Basically, when there's an annotated field, there's a slight difference on how the ORDER BY
SQL is constructed based on whether the field is present or not in the SELECT
statement, or in Django's world, whether the field is present in either values
or values_list
. Here's an example:
This would work fine:
>>> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id', 'random_stuff').order_by('random_stuff') SELECT `auth_user`.`id`, 0 AS `random_stuff` FROM `auth_user` ORDER BY `random_stuff` ASC LIMIT 21 Execution time: 0.000783s [Database: default] <QuerySet [{'id': 1, 'random_stuff': False}, {'id': 2, 'random_stuff': False}, '...(remaining elements truncated)...']> >>>
But this would break:
>>> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff') None Execution time: 0.000340s [Database: default] Traceback (most recent call last): File "<input>", line 1, in <module> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff') File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 256, in __repr__ data = list(self[:REPR_OUTPUT_SIZE + 1]) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 280, in __iter__ self._fetch_all() File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 1324, in _fetch_all self._result_cache = list(self._iterable_class(self)) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 109, in __iter__ for row in compiler.results_iter(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size): File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1130, in results_iter results = self.execute_sql(MULTI, chunked_fetch=chunked_fetch, chunk_size=chunk_size) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql cursor.execute(sql, params) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django_extensions/management/debug_cursor.py", line 50, in execute return utils.CursorWrapper.execute(self, sql, params) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 66, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers return executor(sql, params, many, context) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django_mysql/apps.py", line 75, in rewrite_hook return execute(sql, params, many, context) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in _execute with self.db.wrap_database_errors: File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute return self.cursor.execute(query, args) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/cursors.py", line 206, in execute res = self._query(query) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/cursors.py", line 319, in _query db.query(q) File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/connections.py", line 254, in query _mysql.connection.query(self, query) django.db.utils.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1") >>>
If you look at the SQL query, it breaks because it didn't have an alias to reference to in the ORDER BY
statement, therefore it grabs the entire annotation expression and throw it in there:
>>> print(User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff').query) SELECT `auth_user`.`id` FROM `auth_user` ORDER BY CAST(False AS bool) ASC >>>
This is under MySQL 8.0.33 by the way. Not sure if it's just syntax not supported by MySQL.
Since Django doesn't seem to require an annotated field to be present in values
or values_list
to be used in order_by
, my humble opinion is that it should be slightly smarter for this case by implicitly adding the field into the SELECT
statement, and use its alias in the ORDER BY
.
Thanks for taking a look, and feel free to let me know if you need more information.
Change History (13)
comment:1 by , 13 months ago
Description: | modified (diff) |
---|
comment:2 by , 13 months ago
Description: | modified (diff) |
---|
comment:3 by , 13 months ago
Cc: | added |
---|---|
Keywords: | mysql added; SQL syntax order_by annotate removed |
comment:4 by , 13 months ago
Thanks for your reply Natalia!
I tried this with main
and it behaved the same:
In [5]: User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff') .... ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1") In [6]: print(User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff').query) SELECT `auth_user`.`id` FROM `auth_user` ORDER BY CAST(False AS bool) ASC In [7]: from django import VERSION In [8]: VERSION Out[8]: (5, 0, 0, 'alpha', 0) In [9]:
For our specific scenario that caused this error, it wasn't really an intentional skipping on the annotated fields, it's more like this (it also wasn't the User model at all, using it to avoid exposing our business logics, apologize if that added confusions):
# Some generic queryset pre-assembling with annotations base_queryset = User.objects.annotate(random_field...).order_by('random_field') ... # Later in a specific path, needing some extra information from the base_queryset additional_lookup = base_queryset.filter(...).values('id', 'email')
So it's more like the field was annotated first, and developer just didn't care about it when later reusing the queryset. Using my example, this query with the annotate
and values
changed orders will produce the same error:
User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).order_by('random_stuff').values('id') ... ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1")
To real-life use cases, I can see some cases when you want the rows returned in a specific order, but doesn't care about the value it's been ordered by.
A user-login log table can be an example - you want the last 10 logged in users, but you don't really care about when they logged in. If we specific are looking for order with annotations, maybe an API usage log table would be a good example? You wanna see the 10 users who used the API last so you annotate with their max access time and order by it, without needing the actual values.
comment:5 by , 13 months ago
Oh, an interesting discovery I just found when doing more experiments.
This doesn't seem to be an issue that applies to all fields. For instance, I just found out that the same pattern worked for IntegerField
:
In [20]: User.objects.annotate(random_stuff=Value(1, output_field=IntegerField())).order_by('random_stuff').values('id') Out[20]: <QuerySet [{'id': 1}, {'id': 2}, '...(remaining elements truncated)...']>
So I'm guess it's something specific with MySQL, given your example also showed the boolean query worked in Postgres.
comment:6 by , 13 months ago
Accepting since as per docs, it seems that the only valid datatypes
for a CAST
call are:
DATE Converts value to DATE. Format: "YYYY-MM-DD" DATETIME Converts value to DATETIME. Format: "YYYY-MM-DD HH:MM:SS" DECIMAL Converts value to DECIMAL. Use the optional M and D parameters to specify the maximum number of digits (M) and the number of digits following the decimal point (D). TIME Converts value to TIME. Format: "HH:MM:SS" CHAR Converts value to CHAR (a fixed length string) NCHAR Converts value to NCHAR (like CHAR, but produces a string with the national character set) SIGNED Converts value to SIGNED (a signed 64-bit integer) UNSIGNED Converts value to UNSIGNED (an unsigned 64-bit integer) BINARY Converts value to BINARY (a binary string)
Will write a regression test and bisect to confirm whether this would need a backport or not.
comment:7 by , 13 months ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:8 by , 13 months ago
Summary: | order_by on annotated field that's not present in values/values_list causes SQL syntax error → CAST on MySQL failing for unsupported datatypes |
---|
comment:10 by , 13 months ago
Resolution: | → invalid |
---|---|
Status: | assigned → closed |
Summary: | CAST on MySQL failing for unsupported datatypes → QuerySet.order_by() crashes on constants that cannot be cast on MySQL. |
Triage Stage: | Accepted → Unreviewed |
CAST
is required when ordering by constant value (see #26192, #31496), and ordering by constants is really niche in itself. As far as I'm aware there is nothing that Django could improve/fix here, this is a database limitation (and we cannot document all database caveats). You can order by e.g. an integer constant.
follow-up: 12 comment:11 by , 13 months ago
I guess it's my bad that I didn't clarify myself in the beginning - the ask was not really to support this case, but to make it clearer why it's broken so people who's less familiar with SQL knows how to move forward. Many of our developers have been working with Django for years, but this is our first time seeing such an error and we all scratched our head for quite a bit to come to the bottom of it. To me personally, a SQL syntax error would be the last thing I'd expect from an ORM.
I was thinking to have Django block unsupported CASTs in this case in the beginning. It'll be ideal that this query was never attempted to run in MySQL. You said we can't document all DB caveats, I imagine there might be too many? Not sure how feasible that idea is then... Another idea would be maybe capture this error and throw an ORM layer error instead?
comment:12 by , 13 months ago
Can we have Django block unsupported CASTs in this case?
This would force us to maintain list of unsupported transitions for each database.
You said we can't document all DB caveats, I imagine there might be too many?
Of course, database documentations are thousands of pages long, we don't want to duplicate them in Django docs.
Not sure how feasible that idea is then... Another idea would be maybe capture this error and throw an ORM layer error instead?
MySQL throws a completely generic error (1064 - error in SQL syntax) which could be anything, there is nothing that we can do with it.
Hello tinyx, thank you for your ticket.
My first thought when reading the description is that "it makes no sense" to order a query using a field that was not included in
values
list, though I can't find docs that explicitly confirm my suspicion.I tested this in Postgresql (15.3) and obtained a successful query run:
Nevertheless, I would like to understand your use case a little bit more. I can imagine you defined
random_stuff
purposely silly to simplify the example, but would you have a more concrete/real use case that you could share with us?Also, is there any chance that you can test this same example in Django 4.2 (or even better,
main
)? I don't have a handy mysql to test myself at this time.Lastly, I'm cc'ing Simon who may have more specific thoughts on the matter.