Opened 14 months ago

Closed 14 months ago

Last modified 12 months ago

#31956 closed Bug (fixed)

QuerySet.order_by() chained with values() crashes on JSONField with a custom decoder on PostgreSQL.

Reported by: Marc DEBUREAUX Owned by: Mariusz Felisiak
Component: Database layer (models, ORM) Version: 3.1
Severity: Release blocker Keywords:
Cc: sage, Simon Charette, Thiago Bellini Ribeiro Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Marc DEBUREAUX)

When using ORDER BY clause after an aggregation on a JSONField as described below:

MyModel.objects.values('jsonfield__subfield').annotate(count=Count('id')).order_by('jsonfield__subfield')

I got the following error:

column "myapp_mymodel.jsonfield" must appear in the GROUP BY clause or be used in an aggregate function

The SQL query seems OK at first glance:

SELECT (("mymodel"."jsonfield" -> 'subfield'))::text, COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY (("mymodel"."jsonfield" -> 'subfield'))::text ORDER BY ("mymodel"."jsonfield" -> 'subfield') ASC

But it fails on PostgreSQL 12+ because ORDER BY clause doesn't include ::text casting. Instead the query must be:

SELECT (("mymodel"."jsonfield" -> 'subfield'))::text, COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY (("mymodel"."jsonfield" -> 'subfield'))::text ORDER BY (("mymodel"."jsonfield" -> 'subfield'))::text ASC

Or without casting at all (prone to error?):

SELECT ("mymodel"."jsonfield" -> 'subfield'), COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY ("mymodel"."jsonfield" -> 'subfield') ORDER BY ("mymodel"."jsonfield" -> 'subfield') ASC

Change History (16)

comment:1 Changed 14 months ago by Marc DEBUREAUX

Description: modified (diff)

comment:2 Changed 14 months ago by Mariusz Felisiak

Cc: sage Simon Charette added
Owner: changed from nobody to Mariusz Felisiak
Severity: NormalRelease blocker
Status: newassigned
Summary: JSONField doesn't support ORDER BY clauses after aggregationQuerySet.order_by() chained with values() crashes on JSONField with a custom decoder on PostgreSQL.
Triage Stage: UnreviewedAccepted

Thanks for this ticket. I was able to reproduce the crash on PostgreSQL with a JSONField with a custom decoder. It's caused by the different format used in this case. Can you confirm that the following patch fix this issue for you?

diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index a9768919a2..af0419356a 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -1211,6 +1211,7 @@ class OrderBy(BaseExpression):
                 template = '%%(expression)s IS NOT NULL, %s' % template
         connection.ops.check_expression_support(self)
         expression_sql, params = compiler.compile(self.expression)
+        expression_sql, params = self.expression.select_format(compiler, expression_sql, params)
         placeholders = {
             'expression': expression_sql,
             'ordering': 'DESC' if self.descending else 'ASC',

comment:3 Changed 14 months ago by Mariusz Felisiak

We cannot use proposed solution because it breaks ordering for non-text values.

comment:4 Changed 14 months ago by Mariusz Felisiak

Has patch: set

comment:5 Changed 14 months ago by Mariusz Felisiak <felisiak.mariusz@…>

In 2210539:

Refs #31956 -- Added test for ordering by JSONField with a custom decoder.

comment:6 Changed 14 months ago by Mariusz Felisiak <felisiak.mariusz@…>

Resolution: fixed
Status: assignedclosed

In 0be51d2:

Fixed #31956 -- Fixed crash of ordering by JSONField with a custom decoder on PostgreSQL.

Thanks Marc Debureaux for the report.
Thanks Simon Charette, Nick Pope, and Adam Johnson for reviews.

comment:7 Changed 14 months ago by Mariusz Felisiak <felisiak.mariusz@…>

In 655e1ce6:

[3.1.x] Fixed #31956 -- Fixed crash of ordering by JSONField with a custom decoder on PostgreSQL.

Thanks Marc Debureaux for the report.
Thanks Simon Charette, Nick Pope, and Adam Johnson for reviews.

Backport of 0be51d2226fce030ac9ca840535a524f41e9832c from master

comment:8 Changed 14 months ago by Agris Ameriks

Hello,
I would ask to reopen this ticket.

The fix breaks default functionality of JSONField.

Let me explain the issue.

I have such an model:

class Instance(models.Model):
    title = models.CharField('Title', max_length=255, blank=True)
    responses = models.JSONField(default=dict)

Code:

        query = "SELECT id, responses FROM advancedform_instance"
        with connection.cursor() as cursor:
            cursor.execute(query)
            data = cursor.fetchall()

In django 3.1 responses is returned as a dict.
In django 3.1.1 it is returned as a string.

If I remove the code that was made to fix the issue, it works correctly.

comment:9 Changed 14 months ago by Agris Ameriks

Resolution: fixed
Status: closednew

comment:10 Changed 14 months ago by Mariusz Felisiak

Resolution: fixed
Status: newclosed

Please don't reopen closed tickets. You should create a new ticket if you want to report a regression. Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use json.loads() on fetched data.

comment:11 in reply to:  10 Changed 14 months ago by Agris Ameriks

Sorry, I will add comment to already created bug for the regression (#31973).
Yes, I simplified the query that I have. It is much complex and it's much harder to implement it in ORM.
I think that JSONField should return the dict either way and it was working fine in 3.1.
Also the bug is related to "with a custom decoder", but in current scenario I'm not specifying different decoder.

Yes, I can use json.loads() and I was using it till Django started supporting JSONField with built-in decoder (not sure which version, but long time ago). I don't think that it is ok to return to "unsupporting" default behavior.

Replying to felixxm:

Please don't reopen closed tickets. You should create a new ticket if you want to report a regression. Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use json.loads() on fetched data.

comment:12 Changed 14 months ago by Mariusz Felisiak

Agris, see #31991.

comment:13 Changed 14 months ago by Thiago Bellini Ribeiro

Cc: Thiago Bellini Ribeiro added

comment:14 Changed 14 months ago by GitHub <noreply@…>

In 438b85df:

Refs #31956 -- Doc'd consequences of disabling psycopg2's JSONB typecaster.

Follow up to 0be51d2226fce030ac9ca840535a524f41e9832c.

comment:15 Changed 14 months ago by Mariusz Felisiak <felisiak.mariusz@…>

In 6b16623b:

[3.1.x] Refs #31956 -- Doc'd consequences of disabling psycopg2's JSONB typecaster.

Follow up to 0be51d2226fce030ac9ca840535a524f41e9832c.
Backport of 438b85dfab4f16a2e709e2bcdbfefecd7bfee89e from master

comment:16 in reply to:  10 Changed 12 months ago by Andrew

How did this *manage* to break basic querying?

Replying to Mariusz Felisiak:

Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use json.loads() on fetched data.

This is just silly. Yes, there are *obviously* reasons to use raw sql over an ORM in some cases, this is not new. Putting a tiny note in a release that no one will see isn't enough. This needs to be documented front and center in the raw sql main docs and the postgres areas, because I did read those and no hint.

"just use x" is a bad solution to something that broke basic functionality, and isn't necessarily even easy to do. It turns what is a simple automapping into manual mapping, at best. I guess we could implement something that parses python typehints on result classes detects if the cursor has been effed with and is returning the wrong data type, though. How many more data types do you expect to be broken on purpose in the future?

Also, the workaround is just to add ::json to any jsonb columns in raw queries. So this is only broken for HALF types?

It feels like an easy fix for one problem which ignores some other underlying issue, breaking use cases that a few people don't personally use. If I want a query to work in the future *as written* am I going to have to start creating raw psycopg2 cursors? Because those still work!

Just to add, this is broken on a brand new django project with a single table. Not sure how custom json decoders factor into this.

Last edited 12 months ago by Andrew (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top