#31956 closed Bug (fixed)
QuerySet.order_by() chained with values() crashes on JSONField with a custom decoder on PostgreSQL.
| Reported by: | Marc DEBUREAUX | Owned by: | Mariusz Felisiak |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 3.1 |
| Severity: | Release blocker | Keywords: | |
| Cc: | sage, Simon Charette, Thiago Bellini Ribeiro | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
When using ORDER BY clause after an aggregation on a JSONField as described below:
MyModel.objects.values('jsonfield__subfield').annotate(count=Count('id')).order_by('jsonfield__subfield')
I got the following error:
column "myapp_mymodel.jsonfield" must appear in the GROUP BY clause or be used in an aggregate function
The SQL query seems OK at first glance:
SELECT (("mymodel"."jsonfield" -> 'subfield'))::text, COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY (("mymodel"."jsonfield" -> 'subfield'))::text ORDER BY ("mymodel"."jsonfield" -> 'subfield') ASC
But it fails on PostgreSQL 12+ because ORDER BY clause doesn't include ::text casting. Instead the query must be:
SELECT (("mymodel"."jsonfield" -> 'subfield'))::text, COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY (("mymodel"."jsonfield" -> 'subfield'))::text ORDER BY (("mymodel"."jsonfield" -> 'subfield'))::text ASC
Or without casting at all (prone to error?):
SELECT ("mymodel"."jsonfield" -> 'subfield'), COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY ("mymodel"."jsonfield" -> 'subfield') ORDER BY ("mymodel"."jsonfield" -> 'subfield') ASC
Change History (18)
comment:1 by , 5 years ago
| Description: | modified (diff) |
|---|
comment:2 by , 5 years ago
| Cc: | added |
|---|---|
| Owner: | changed from to |
| Severity: | Normal → Release blocker |
| Status: | new → assigned |
| Summary: | JSONField doesn't support ORDER BY clauses after aggregation → QuerySet.order_by() chained with values() crashes on JSONField with a custom decoder on PostgreSQL. |
| Triage Stage: | Unreviewed → Accepted |
comment:3 by , 5 years ago
We cannot use proposed solution because it breaks ordering for non-text values.
comment:8 by , 5 years ago
Hello,
I would ask to reopen this ticket.
The fix breaks default functionality of JSONField.
Let me explain the issue.
I have such an model:
class Instance(models.Model):
title = models.CharField('Title', max_length=255, blank=True)
responses = models.JSONField(default=dict)
Code:
query = "SELECT id, responses FROM advancedform_instance"
with connection.cursor() as cursor:
cursor.execute(query)
data = cursor.fetchall()
In django 3.1 responses is returned as a dict.
In django 3.1.1 it is returned as a string.
If I remove the code that was made to fix the issue, it works correctly.
comment:9 by , 5 years ago
| Resolution: | fixed |
|---|---|
| Status: | closed → new |
follow-ups: 11 16 comment:10 by , 5 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
Please don't reopen closed tickets. You should create a new ticket if you want to report a regression. Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use json.loads() on fetched data.
comment:11 by , 5 years ago
Sorry, I will add comment to already created bug for the regression (#31973).
Yes, I simplified the query that I have. It is much complex and it's much harder to implement it in ORM.
I think that JSONField should return the dict either way and it was working fine in 3.1.
Also the bug is related to "with a custom decoder", but in current scenario I'm not specifying different decoder.
Yes, I can use json.loads() and I was using it till Django started supporting JSONField with built-in decoder (not sure which version, but long time ago). I don't think that it is ok to return to "unsupporting" default behavior.
Replying to felixxm:
Please don't reopen closed tickets. You should create a new ticket if you want to report a regression. Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use
json.loads()on fetched data.
comment:13 by , 5 years ago
| Cc: | added |
|---|
comment:16 by , 5 years ago
How did this *manage* to break basic querying?
Replying to Mariusz Felisiak:
Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use
json.loads()on fetched data.
This is just silly. Yes, there are *obviously* reasons to use raw sql over an ORM in some cases, this is not new. Putting a tiny note in a release that no one will see isn't enough. This needs to be documented front and center in the raw sql main docs and the postgres areas, because I did read those and no hint.
"just use x" is a bad solution to something that broke basic functionality, and isn't necessarily even easy to do. It turns what is a simple automapping into manual mapping, at best. I guess we could implement something that parses python typehints on result classes detects if the cursor has been effed with and is returning the wrong data type, though. How many more data types do you expect to be broken on purpose in the future?
Also, the workaround is just to add ::json to any jsonb columns in raw queries. So this is only broken for HALF types?
It feels like an easy fix for one problem which ignores some other underlying issue, breaking use cases that a few people don't personally use. If I want a query to work in the future *as written* am I going to have to start creating raw psycopg2 cursors? Because those still work!
Just to add, this is broken on a brand new django project with a single table. Not sure how custom json decoders factor into this.
comment:18 by , 2 years ago
For anyone that falls into the "why isn't it a pythonic object already?" pitfall - We've been able to achieve our existing patterns, post this change, using:
- Take complete control over the
psycopg2.extras.register_default_jsonbfunction, providing our own globally registered load call. Note: You can supply an alternate JSONloadsfunction to the_original_jsonb_functionas required and it will stick (hooray!)
import psycopg2
_original_jsonb_function = None
def _handle_jsonb():
"""
Workaround for pipeline-altering solution in:
https://code.djangoproject.com/ticket/31956
"""
global _original_jsonb_function
def _ignore_command(*args, **kwargs):
pass
if _original_jsonb_function is None:
_original_jsonb_function = psycopg2.extras.register_default_jsonb
# _original_jsonb_function(globally=True, loads=...) if required
psycopg2.extras.register_default_jsonb = _ignore_command
class CoreConfig(AppConfig):
name = 'core'
verbose_name = 'Core'
def ready(self):
_handle_jsonb()
- Overloading the JSONField model and returning the data unharmed. This is key to allow both raw and ORM queries to operate as expected. You'll need to replace all instances of
models.JSONFieldin your application.
class DirectJSONField(models.JSONField):
""" ... """
def from_db_value(self, value, expression, connection):
if isinstance(value, (dict, list)):
return value
return super().from_db_value(value, expression, connection)
- (alternate) If you're not inclined to replace all of your JSONFields, you can monkey patch the class.
def from_db_value(self, value, expression, connection):
if isinstance(value, (dict, list)):
return value
return super().from_db_value(value, expression, connection)
models.JSONField.from_db_value = from_db_value
Thanks for this ticket. I was able to reproduce the crash on PostgreSQL with a
JSONFieldwith a custom decoder. It's caused by the different format used in this case. Can you confirm that the following patch fix this issue for you?diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py index a9768919a2..af0419356a 100644 --- a/django/db/models/expressions.py +++ b/django/db/models/expressions.py @@ -1211,6 +1211,7 @@ class OrderBy(BaseExpression): template = '%%(expression)s IS NOT NULL, %s' % template connection.ops.check_expression_support(self) expression_sql, params = compiler.compile(self.expression) + expression_sql, params = self.expression.select_format(compiler, expression_sql, params) placeholders = { 'expression': expression_sql, 'ordering': 'DESC' if self.descending else 'ASC',