Opened 3 months ago

Closed 3 months ago

#29416 closed Bug (fixed)

Undesired subquery added to the GROUP BY clause

Reported by: Antoine Pinsard Owned by: felixxm
Component: Database layer (models, ORM) Version: 2.0
Severity: Release blocker Keywords: groupby, subquery
Cc: felixxm Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Antoine Pinsard)

I am facing an issue while upgrading from Django 1.11 to Django 2.0.

I have a complex query interacting with a legacy MySQL database, which I simplified below to highlight the issue:

>>> from user.models import Sponsor
>>> from django.db.models import ExpressionWrapper, Count, DecimalField
>>> from django.db.models.expressions import RawSQL
>>> nb_reports = RawSQL("SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2", [])
>>> str(Sponsor.objects.all().annotate(report_rate=ExpressionWrapper(nb_reports / Count('deliveries'), output_field=DecimalField())).order_by('-report_rate').query)

This code, in Django 1.11.9, gives me the following query:

SELECT `ala_sponsor`.`sponId`, [...], ((SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) / COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate`
FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`)
GROUP BY `ala_sponsor`.`sponId` ORDER BY `report_rate` DESC

This is the expected behavior and it works well.

However, in Django 2.0.5, the same code gives me this query:

SELECT `ala_sponsor`.`sponId`, [...], ((SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) / COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate`
FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`)
GROUP BY `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2)
ORDER BY `report_rate` DESC

As you can see, the ORM appended the subquery (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) to the GROUP BY clause. Which is wrong, and takes forever to execute.



I tried to play with .values('id') or such as I usually do when I get unexpected GROUP BY. I spent an afternoon on it but there's no way I could get rid of this undesired group by clause. The order_by is not to blame either. Here is another example of what I tried:

str(Sponsor.objects.all().values('id').annotate(report_rate=ExpressionWrapper(nb_reports / Count('deliveries'), output_field=DecimalField())).order_by().query)

Which gives:

SELECT `ala_sponsor`.`sponId`, ((SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) / COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate`
FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`)
GROUP BY `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2)
ORDER BY NULL

Also note that this is the annotate(report_rate=ExpressionWrapper(nb_reports / Count('deliveries'), output_field=DecimalField())) that causes this issue. If I only do annotate(nb_reports=nb_reports) or annotate(nb_deliveries=COUNT('deliveries')) there is no additional GROUP BY clause generated.

In [40]: str(Sponsor.objects.all().values('id').annotate(nb_reports=nb_reports).order_by().query)
Out[40]: "SELECT `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) AS `nb_reports` FROM `ala_sponsor`"

In [41]: str(Sponsor.objects.all().values('id').annotate(nb_deliveries=Count('deliveries')).order_by().query)
Out[41]: 'SELECT `ala_sponsor`.`sponId`, COUNT(`ala_sponsor_need`.`asnId`) AS `nb_deliveries` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY `ala_sponsor`.`sponId` ORDER BY NULL'

In [42]: str(Sponsor.objects.all().values('id').annotate(nb_reports=nb_reports, nb_deliveries=Count('deliveries')).order_by().query)
Out[42]: "SELECT `ala_sponsor`.`sponId`, COUNT(`ala_sponsor_need`.`asnId`) AS `nb_deliveries`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) AS `nb_reports` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) ORDER BY NULL"

Attachments (1)

test_regression_29416.py (1.2 KB) - added by Antoine Pinsard 3 months ago.
tests/annotations/test_regression_29416.py

Download all attachments as: .zip

Change History (24)

comment:1 Changed 3 months ago by Antoine Pinsard

Description: modified (diff)

comment:2 Changed 3 months ago by Antoine Pinsard

Description: modified (diff)

comment:3 Changed 3 months ago by Tim Graham

Can you bisect to find where the behavior changed?

comment:4 Changed 3 months ago by Antoine Pinsard

So, here is the commit that introduced the change of behavior: https://github.com/django/django/commit/1d070d027c218285b66c0bde8079034b33a87f11

Changed 3 months ago by Antoine Pinsard

Attachment: test_regression_29416.py added

tests/annotations/test_regression_29416.py

comment:5 Changed 3 months ago by Tim Graham

Cc: felixxm added

comment:6 Changed 3 months ago by Antoine Pinsard

I think the issue is the getattr(expr, 'alias', None) not in pk_aliases. Maybe it should be getattr(expr, 'alias', None) not in pk_aliases and not isinstance(expr, RawSQL) or something like that?

comment:7 Changed 3 months ago by felixxm

Owner: changed from nobody to felixxm
Status: newassigned

Probably this condition should be more complex. I will dig into it on DjangoConEU sprints.

comment:8 Changed 3 months ago by Antoine Pinsard

OK, FYI I was able to workaround the issue:

>>> nb_reports = RawSQL("SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2", [])
>>> nb_reports.alias = 'nb_reports'
>>> nb_reports.target = type('bare', (object,), {'primary_key': True})

So I will be able to upgrade to 2.0 meanwhile.

comment:9 Changed 3 months ago by Tim Graham

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

comment:10 Changed 3 months ago by felixxm

Has patch: set

comment:11 Changed 3 months ago by Tim Graham

Triage Stage: AcceptedReady for checkin

comment:12 Changed 3 months ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 4ab1f559:

Fixed #29416 -- Removed unnecesary subquery from GROUP BY clause on MySQL when using a RawSQL annotation.

Regression in 1d070d027c218285b66c0bde8079034b33a87f11.

comment:13 Changed 3 months ago by Tim Graham <timograham@…>

In b6e48f51:

[2.1.x] Fixed #29416 -- Removed unnecesary subquery from GROUP BY clause on MySQL when using a RawSQL annotation.

Regression in 1d070d027c218285b66c0bde8079034b33a87f11.
Backport of 4ab1f559e8d1264bcb20bb497988973194f5d8f2 from master

comment:14 Changed 3 months ago by Tim Graham <timograham@…>

In 74bbef4e:

[2.0.x] Fixed #29416 -- Removed unnecesary subquery from GROUP BY clause on MySQL when using a RawSQL annotation.

Regression in 1d070d027c218285b66c0bde8079034b33a87f11.

Backport of 4ab1f559e8d1264bcb20bb497988973194f5d8f2 from master

comment:15 Changed 3 months ago by Tim Graham

Has patch: unset
Resolution: fixed
Status: closednew
Triage Stage: Ready for checkinAccepted

This broke a MySQL GIS test on MySQL 5.7:

gis_tests.geoapp.test_expressions.GeoExpressionsTests.test_multiple_annotation

Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test_django.geoapp_multifields.point' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by")

comment:16 Changed 3 months ago by Simon Charette

MySQL documentation for reference.

https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by

Tim, is it the only MySQL 5.7 test failing against this patch?

Last edited 3 months ago by Simon Charette (previous) (diff)

comment:17 Changed 3 months ago by Tim Graham

Yes, that's the only test failure.

comment:18 Changed 3 months ago by felixxm

IMO query in this test (even if we add "ST_Distance(geoapp_multifields.point, ST_GeomFromText('POINT (-95.363151 29.763374)'))" to the GROUP BY clause) doesn't have much sense. This is not a realistic use case. It also didn't work before 1d070d027c218285b66c0bde8079034b33a87f11.

SELECT
    `geoapp_city`.`name`,
    ST_Distance(`geoapp_multifields`.`point`, ST_GeomFromText('POINT (-95.363151 29.763374)')) AS `distance`,
    COUNT(`geoapp_multifields`.`id`) AS `count`
FROM `geoapp_city`
LEFT OUTER JOIN `geoapp_multifields` ON (`geoapp_city`.`id` = `geoapp_multifields`.`city_id`)
GROUP BY `geoapp_city`.`id`
ORDER BY `geoapp_city`.`id` ASC
LIMIT 1

We can fix this test by changing:

diff --git a/tests/gis_tests/geoapp/test_expressions.py b/tests/gis_tests/geoapp/test_expressions.py
index 2d0ebbcae0..89e83a782f 100644
--- a/tests/gis_tests/geoapp/test_expressions.py
+++ b/tests/gis_tests/geoapp/test_expressions.py
@@ -3,7 +3,7 @@ from unittest import skipUnless
 from django.contrib.gis.db.models import F, GeometryField, Value, functions
 from django.contrib.gis.geos import Point, Polygon
 from django.db import connection
-from django.db.models import Count
+from django.db.models import Count, Min
 from django.test import TestCase, skipUnlessDBFeature
 
 from ..utils import postgis
@@ -56,7 +56,7 @@ class GeoExpressionsTests(TestCase):
             poly=Polygon(((1, 1), (1, 2), (2, 2), (2, 1), (1, 1))),
         )
         qs = City.objects.values('name').annotate(
-            distance=functions.Distance('multifields__point', multi_field.city.point),
+            distance=Min(functions.Distance('multifields__point', multi_field.city.point)),
         ).annotate(count=Count('multifields'))
         self.assertTrue(qs.first())

or by adding multifields__point to the values, i.e. City.objects.values('name', 'multifields__point').

comment:19 Changed 3 months ago by felixxm

Has patch: set

comment:20 Changed 3 months ago by Tim Graham <timograham@…>

In d0ad03c:

Refs #29416 -- Fixed GeoExpressionsTests.test_multiple_annotation() on MySQL 5.7+.

Failure introduced in b6e48f514ebe4e31b76e1750e043d4f296e645dc.

comment:21 Changed 3 months ago by Tim Graham <timograham@…>

In 31b9cf9:

[2.1.x] Refs #29416 -- Fixed GeoExpressionsTests.test_multiple_annotation() on MySQL 5.7+.

Failure introduced in b6e48f514ebe4e31b76e1750e043d4f296e645dc.

Backport of d0ad03cded64fc307b15668c81d0c65fd8486eff from master

comment:22 Changed 3 months ago by Tim Graham <timograham@…>

In b57ea27d:

[2.0.x] Refs #29416 -- Fixed GeoExpressionsTests.test_multiple_annotation() on MySQL 5.7+.

Failure introduced in b6e48f514ebe4e31b76e1750e043d4f296e645dc.

Backport of d0ad03cded64fc307b15668c81d0c65fd8486eff from master

comment:23 Changed 3 months ago by Tim Graham

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top