Opened 9 months ago

Closed 9 months ago

Last modified 9 months ago

#34771 closed Bug (invalid)

QuerySet.order_by() crashes on constants that cannot be cast on MySQL.

Reported by: Yitao Xiong Owned by: Jordan Bae
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords: mysql
Cc: Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Yitao Xiong)

Although this is an extremely rare case, it does seem like to be something Django could've captured. Basically, when there's an annotated field, there's a slight difference on how the ORDER BY SQL is constructed based on whether the field is present or not in the SELECT statement, or in Django's world, whether the field is present in either values or values_list. Here's an example:

This would work fine:

>>> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id', 'random_stuff').order_by('random_stuff')
SELECT `auth_user`.`id`,
       0 AS `random_stuff`
  FROM `auth_user`
 ORDER BY `random_stuff` ASC
 LIMIT 21

Execution time: 0.000783s [Database: default]
<QuerySet [{'id': 1, 'random_stuff': False}, {'id': 2, 'random_stuff': False}, '...(remaining elements truncated)...']>
>>> 

But this would break:

>>> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff')
  None

Execution time: 0.000340s [Database: default]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff')
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 256, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 280, in __iter__
    self._fetch_all()
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 1324, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/query.py", line 109, in __iter__
    for row in compiler.results_iter(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size):
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1130, in results_iter
    results = self.execute_sql(MULTI, chunked_fetch=chunked_fetch, chunk_size=chunk_size)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
    cursor.execute(sql, params)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django_extensions/management/debug_cursor.py", line 50, in execute
    return utils.CursorWrapper.execute(self, sql, params)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django_mysql/apps.py", line 75, in rewrite_hook
    return execute(sql, params, many, context)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in _execute
    with self.db.wrap_database_errors:
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/Users/tinyx/.pyenv/versions/portal/lib/python3.10/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
django.db.utils.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1")
>>> 

If you look at the SQL query, it breaks because it didn't have an alias to reference to in the ORDER BY statement, therefore it grabs the entire annotation expression and throw it in there:

>>> print(User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff').query)
SELECT `auth_user`.`id` FROM `auth_user` ORDER BY CAST(False AS bool) ASC
>>> 

This is under MySQL 8.0.33 by the way. Not sure if it's just syntax not supported by MySQL.

Since Django doesn't seem to require an annotated field to be present in values or values_list to be used in order_by, my humble opinion is that it should be slightly smarter for this case by implicitly adding the field into the SELECT statement, and use its alias in the ORDER BY.

Thanks for taking a look, and feel free to let me know if you need more information.

Change History (13)

comment:1 by Yitao Xiong, 9 months ago

Description: modified (diff)

comment:2 by Yitao Xiong, 9 months ago

Description: modified (diff)

comment:3 by Natalia Bidart, 9 months ago

Cc: Simon Charette added
Keywords: mysql added; SQL syntax order_by annotate removed

Hello tinyx, thank you for your ticket.

My first thought when reading the description is that "it makes no sense" to order a query using a field that was not included in values list, though I can't find docs that explicitly confirm my suspicion.

I tested this in Postgresql (15.3) and obtained a successful query run:

 >>> print(User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff').query)
SELECT "auth_user"."id" FROM "auth_user" ORDER BY (False)::boolean ASC
>>> User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff')
<QuerySet [{'id': 2}, {'id': 3}, {'id': 1}, {'id': 4}, {'id': 5}, {'id': 6}, {'id': 7}]>
>>> 

Nevertheless, I would like to understand your use case a little bit more. I can imagine you defined random_stuff purposely silly to simplify the example, but would you have a more concrete/real use case that you could share with us?

Also, is there any chance that you can test this same example in Django 4.2 (or even better, main)? I don't have a handy mysql to test myself at this time.

Lastly, I'm cc'ing Simon who may have more specific thoughts on the matter.

comment:4 by Yitao Xiong, 9 months ago

Thanks for your reply Natalia!

I tried this with main and it behaved the same:

In [5]: User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff')
....
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1")

In [6]: print(User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).values('id').order_by('random_stuff').query)
SELECT `auth_user`.`id` FROM `auth_user` ORDER BY CAST(False AS bool) ASC


In [7]: from django import VERSION

In [8]: VERSION
Out[8]: (5, 0, 0, 'alpha', 0)

In [9]: 

For our specific scenario that caused this error, it wasn't really an intentional skipping on the annotated fields, it's more like this (it also wasn't the User model at all, using it to avoid exposing our business logics, apologize if that added confusions):

# Some generic queryset pre-assembling with annotations
base_queryset = User.objects.annotate(random_field...).order_by('random_field')

...
# Later in a specific path, needing some extra information from the base_queryset
additional_lookup = base_queryset.filter(...).values('id', 'email')

So it's more like the field was annotated first, and developer just didn't care about it when later reusing the queryset. Using my example, this query with the annotate and values changed orders will produce the same error:

User.objects.annotate(random_stuff=Value(False, output_field=BooleanField())).order_by('random_stuff').values('id')
...

ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'bool) ASC LIMIT 21' at line 1")

To real-life use cases, I can see some cases when you want the rows returned in a specific order, but doesn't care about the value it's been ordered by.

A user-login log table can be an example - you want the last 10 logged in users, but you don't really care about when they logged in. If we specific are looking for order with annotations, maybe an API usage log table would be a good example? You wanna see the 10 users who used the API last so you annotate with their max access time and order by it, without needing the actual values.

comment:5 by Yitao Xiong, 9 months ago

Oh, an interesting discovery I just found when doing more experiments.

This doesn't seem to be an issue that applies to all fields. For instance, I just found out that the same pattern worked for IntegerField:

In [20]: User.objects.annotate(random_stuff=Value(1, output_field=IntegerField())).order_by('random_stuff').values('id')
Out[20]: <QuerySet [{'id': 1}, {'id': 2}, '...(remaining elements truncated)...']>

So I'm guess it's something specific with MySQL, given your example also showed the boolean query worked in Postgres.

comment:6 by Natalia Bidart, 9 months ago

Accepting since as per docs, it seems that the only valid datatypes for a CAST call are:

DATE 	Converts value to DATE. Format: "YYYY-MM-DD"
DATETIME 	Converts value to DATETIME. Format: "YYYY-MM-DD HH:MM:SS"
DECIMAL 	Converts value to DECIMAL. Use the optional M and D parameters to specify the maximum number of digits (M) and the number of digits following the decimal point (D).
TIME 	Converts value to TIME. Format: "HH:MM:SS"
CHAR 	Converts value to CHAR (a fixed length string)
NCHAR 	Converts value to NCHAR (like CHAR, but produces a string with the national character set)
SIGNED 	Converts value to SIGNED (a signed 64-bit integer)
UNSIGNED 	Converts value to UNSIGNED (an unsigned 64-bit integer)
BINARY 	Converts value to BINARY (a binary string)

Will write a regression test and bisect to confirm whether this would need a backport or not.

comment:7 by Natalia Bidart, 9 months ago

Triage Stage: UnreviewedAccepted

comment:8 by Natalia Bidart, 9 months ago

Summary: order_by on annotated field that's not present in values/values_list causes SQL syntax errorCAST on MySQL failing for unsupported datatypes

comment:9 by Jordan Bae, 9 months ago

Owner: changed from nobody to Jordan Bae
Status: newassigned

Could i look into it?

comment:10 by Mariusz Felisiak, 9 months ago

Resolution: invalid
Status: assignedclosed
Summary: CAST on MySQL failing for unsupported datatypesQuerySet.order_by() crashes on constants that cannot be cast on MySQL.
Triage Stage: AcceptedUnreviewed

CAST is required when ordering by constant value (see #26192, #31496), and ordering by constants is really niche in itself. As far as I'm aware there is nothing that Django could improve/fix here, this is a database limitation (and we cannot document all database caveats). You can order by e.g. an integer constant.

comment:11 by Yitao Xiong, 9 months ago

I guess it's my bad that I didn't clarify myself in the beginning - the ask was not really to support this case Just realized I lied there... I did ask for a fix, mainly because the different behaviors between .values() then .annotate() VS .annotate() then .values() made me thinking that this was a case that Django could make work. But if implicitly adding the annotated field to SELECT is too tricky and bug prone, I think it'll be great if we can make it clearer why it's broken so people who's less familiar with SQL knows how to move forward. Many of our developers have been working with Django for years, but this is our first time seeing such an error and we all scratched our head for quite a bit to come to the bottom of it. To me personally, a SQL syntax error would be the last thing I'd expect from an ORM.

Can we have Django block unsupported CASTs in this case? It'll be ideal that this query was never attempted to run in MySQL. You said we can't document all DB caveats, I imagine there might be too many? Not sure how feasible that idea is then... Another idea would be maybe capture this error and throw an ORM layer error instead?

Last edited 9 months ago by Yitao Xiong (previous) (diff)

in reply to:  11 comment:12 by Mariusz Felisiak, 9 months ago

Can we have Django block unsupported CASTs in this case?

This would force us to maintain list of unsupported transitions for each database.

You said we can't document all DB caveats, I imagine there might be too many?

Of course, database documentations are thousands of pages long, we don't want to duplicate them in Django docs.

Not sure how feasible that idea is then... Another idea would be maybe capture this error and throw an ORM layer error instead?

MySQL throws a completely generic error (1064 - error in SQL syntax) which could be anything, there is nothing that we can do with it.

comment:13 by Yitao Xiong, 9 months ago

Understood, that makes sense. Thanks for looking at this.

Note: See TracTickets for help on using tickets.
Back to Top