Opened 3 years ago

Closed 3 years ago

#33282 closed Bug (fixed)

django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression

Reported by: Antonio Terceiro Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,

I'm working on porting lava to Django 3.2. When I run the test suite, I find 2 issues. One is #32690, for which I have the corresponding patch backported locally. The other one is this is a bunch of crashes that look like this:

E django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression

I wasn't able to produce a minimal reproducer app yet, but I will try to point to the relevant code.

in lava we have a few special model managers for handling access control. They are here: https://git.lavasoftware.org/lava/lava/-/blob/master/lava_scheduler_app/managers.py

This issue can easily be reproduced in a shell session:

$ ./manage.py shell
Python 3.9.8 (main, Nov  7 2021, 15:47:09) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.27.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from lava_scheduler_app.models import TestJob, User

In [2]: TestJob.objects.visible_by_user(User.objects.last())
Out[2]: ---------------------------------------------------------------------------
CardinalityViolation                      Traceback (most recent call last)
/usr/lib/python3/dist-packages/django/db/backends/utils.py in _execute(self, sql, params, *ignored_wrapper_args)
     83             else:
---> 84                 return self.cursor.execute(sql, params)
     85 

CardinalityViolation: more than one row returned by a subquery used as an expression


The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
/usr/lib/python3/dist-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

/usr/lib/python3/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
    392                         if cls is not object \
    393                                 and callable(cls.__dict__.get('__repr__')):
--> 394                             return _repr_pprint(obj, self, cycle)
    395 
    396             return _default_pprint(obj, self, cycle)

/usr/lib/python3/dist-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    698     """A pprint that just redirects to the normal repr function."""
    699     # Find newlines and replace them with p.break_()
--> 700     output = repr(obj)
    701     lines = output.splitlines()
    702     with p.group():

/usr/lib/python3/dist-packages/django/db/models/query.py in __repr__(self)
    254 
    255     def __repr__(self):
--> 256         data = list(self[:REPR_OUTPUT_SIZE + 1])
    257         if len(data) > REPR_OUTPUT_SIZE:
    258             data[-1] = "...(remaining elements truncated)..."

/usr/lib/python3/dist-packages/django/db/models/query.py in __len__(self)
    260 
    261     def __len__(self):
--> 262         self._fetch_all()
    263         return len(self._result_cache)
    264 

/usr/lib/python3/dist-packages/django/db/models/query.py in _fetch_all(self)
   1322     def _fetch_all(self):
   1323         if self._result_cache is None:
-> 1324             self._result_cache = list(self._iterable_class(self))
   1325         if self._prefetch_related_lookups and not self._prefetch_done:
   1326             self._prefetch_related_objects()

/usr/lib/python3/dist-packages/django/db/models/query.py in __iter__(self)
     49         # Execute the query. This will also fill compiler.select, klass_info,
     50         # and annotations.
---> 51         results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
     52         select, klass_info, annotation_col_map = (compiler.select, compiler.klass_info,
     53                                                   compiler.annotation_col_map)

/usr/lib/python3/dist-packages/django/db/models/sql/compiler.py in execute_sql(self, result_type, chunked_fetch, chunk_size)
   1173             cursor = self.connection.cursor()
   1174         try:
-> 1175             cursor.execute(sql, params)
   1176         except Exception:
   1177             # Might fail for server-side cursors (e.g. connection closed)

/usr/lib/python3/dist-packages/django/db/backends/utils.py in execute(self, sql, params)
     96     def execute(self, sql, params=None):
     97         with self.debug_sql(sql, params, use_last_executed_query=True):
---> 98             return super().execute(sql, params)
     99 
    100     def executemany(self, sql, param_list):

/usr/lib/python3/dist-packages/django/db/backends/utils.py in execute(self, sql, params)
     64 
     65     def execute(self, sql, params=None):
---> 66         return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
     67 
     68     def executemany(self, sql, param_list):

/usr/lib/python3/dist-packages/django/db/backends/utils.py in _execute_with_wrappers(self, sql, params, many, executor)
     73         for wrapper in reversed(self.db.execute_wrappers):
     74             executor = functools.partial(wrapper, executor)
---> 75         return executor(sql, params, many, context)
     76 
     77     def _execute(self, sql, params, *ignored_wrapper_args):

/usr/lib/python3/dist-packages/django/db/backends/utils.py in _execute(self, sql, params, *ignored_wrapper_args)
     82                 return self.cursor.execute(sql)
     83             else:
---> 84                 return self.cursor.execute(sql, params)
     85 
     86     def _executemany(self, sql, param_list, *ignored_wrapper_args):

/usr/lib/python3/dist-packages/django/db/utils.py in __exit__(self, exc_type, exc_value, traceback)
     88                 if dj_exc_type not in (DataError, IntegrityError):
     89                     self.wrapper.errors_occurred = True
---> 90                 raise dj_exc_value.with_traceback(traceback) from exc_value
     91 
     92     def __call__(self, func):

/usr/lib/python3/dist-packages/django/db/backends/utils.py in _execute(self, sql, params, *ignored_wrapper_args)
     82                 return self.cursor.execute(sql)
     83             else:
---> 84                 return self.cursor.execute(sql, params)
     85 
     86     def _executemany(self, sql, param_list, *ignored_wrapper_args):

ProgrammingError: more than one row returned by a subquery used as an expression

If I downgrade Django to 2.2, that just works. This is reproducible with Django 3.2 and with the current main branch from git.

Change History (14)

comment:1 by Antonio Terceiro, 3 years ago

Bisecting points to the same culprit as #32690:

35431298226165986ad07e91f9d3aca721ff38ec is the first bad commit
commit 35431298226165986ad07e91f9d3aca721ff38ec
Author: Simon Charette <charette.s@gmail.com>
Date:   Wed Mar 6 01:05:55 2019 -0500

    Refs #27149 -- Moved subquery expression resolving to Query.
    
    This makes Subquery a thin wrapper over Query and makes sure it respects
    the Expression source expression API by accepting the same number of
    expressions as it returns. Refs #30188.
    
    It also makes OuterRef usable in Query without Subquery wrapping. This
    should allow Query's internals to more easily perform subquery push downs
    during split_exclude(). Refs #21703.

 django/db/models/expressions.py | 64 +++++++----------------------------------
 django/db/models/sql/query.py   | 28 ++++++++++++++++--
 django/db/models/sql/where.py   | 15 ++++++++++
 tests/queries/tests.py          |  2 +-
 4 files changed, 52 insertions(+), 57 deletions(-)
bisect run success

comment:2 by Antonio Terceiro, 3 years ago

FWIW the generated query that causes this is:

SELECT
COUNT (*)
AS "__count" FROM "lava_scheduler_app_testjob"
WHERE (((NOT "lava_scheduler_app_testjob".
	 "is_public" AND "lava_scheduler_app_testjob"."submitter_id" =
	 %s) OR ("lava_scheduler_app_testjob".
		 "is_public" AND "lava_scheduler_app_testjob".
		 "id" IN (SELECT U0.
			  "id" FROM "lava_scheduler_app_testjob" U0 LEFT OUTER
			  JOIN "lava_scheduler_app_testjob_viewing_groups" U1
			  ON (U0."id" =
			      U1."testjob_id") WHERE U1.
			  "group_id" IS NULL)
		 AND (("lava_scheduler_app_testjob".
		       "actual_device_id" IS NOT NULL AND
		       "lava_scheduler_app_testjob".
		       "actual_device_id" IN (SELECT V0.
					      "hostname" FROM
					      "lava_scheduler_app_device" V0
					      LEFT OUTER JOIN
					      "lava_scheduler_app_groupdevicepermission"
					      V1 ON (V0."hostname" =
						     V1.
						     "device_id") LEFT OUTER
					      JOIN "auth_group" V2 ON (V1.
								       "group_id"
								       =
								       V2.
								       "id")
					      LEFT OUTER JOIN
					      "auth_user_groups" V3 ON (V2.
									"id" =
									V3.
									"group_id")
					      LEFT OUTER JOIN
					      "auth_permission" V5 ON (V1.
								       "permission_id"
								       =
								       V5.
								       "id")
					      GROUP BY V0."hostname",
					      (SELECT U0.
					       "name" FROM
					       "lava_scheduler_app_devicetype"
					       U0 LEFT OUTER JOIN
					       "lava_scheduler_app_groupdevicetypepermission"
					       U1 ON (U0."name" =
						      U1.
						      "devicetype_id") LEFT
					       OUTER JOIN "auth_group" U2
					       ON (U1."group_id" =
						   U2.
						   "id") LEFT OUTER JOIN
					       "auth_user_groups" U3 ON (U2.
									 "id"
									 =
									 U3.
									 "group_id")
					       LEFT OUTER JOIN
					       "auth_permission" U5 ON (U1.
									"permission_id"
									=
									U5.
									"id")
					       GROUP BY U0.
					       "name"
					       HAVING (SUM
						       (CASE
							WHEN (U5."codename" =
							      %s) THEN %
							s ELSE % s END) =
						       %s OR
						       NOT (SUM
							    (CASE
							     WHEN (U3.
								   "user_id" =
								   %s AND U5.
								   "codename"
								   IN (%s, %s,
								       %s))
							     THEN % s ELSE %
							     s END) =
							    %s)))
					      HAVING ((SUM
						       (CASE
							WHEN (V5."codename" =
							      %s) THEN %
							s ELSE % s END) =
						       %s AND V0.
						       "device_type_id"
						       IN (SELECT U0.
							   "name" FROM
							   "lava_scheduler_app_devicetype"
							   U0 LEFT OUTER JOIN
							   "lava_scheduler_app_groupdevicetypepermission"
							   U1 ON (U0."name" =
								  U1.
								  "devicetype_id")
							   LEFT OUTER JOIN
							   "auth_group" U2
							   ON (U1."group_id" =
							       U2.
							       "id") LEFT
							   OUTER JOIN
							   "auth_user_groups"
							   U3 ON (U2."id" =
								  U3.
								  "group_id")
							   LEFT OUTER JOIN
							   "auth_permission"
							   U5 ON (U1.
								  "permission_id"
								  =
								  U5.
								  "id") GROUP
							   BY U0.
							   "name"
							   HAVING (SUM
								   (CASE
								    WHEN (U5.
									  "codename"
									  =
									  %s)
								    THEN %
								    s ELSE %
								    s END) =
								   %s OR
								   NOT (SUM
									(CASE
									 WHEN
									 (U3.
									  "user_id"
									  =
									  %s
									  AND
									  U5.
									  "codename"
									  IN
									  (%s,
									   %s,
									   %s))
									 THEN
									 %
									 s
									 ELSE
									 %
									 s
									 END)
									=
									%s))))
						      OR
						      NOT (SUM
							   (CASE
							    WHEN (V3.
								  "user_id" =
								  %s AND V5.
								  "codename"
								  IN (%s, %s,
								      %s))
							    THEN % s ELSE %
							    s END) =
							   %s))))
		      OR ("lava_scheduler_app_testjob".
			  "actual_device_id" IS NULL AND
			  "lava_scheduler_app_testjob".
			  "requested_device_type_id" IN (SELECT U0.
							 "name" FROM
							 "lava_scheduler_app_devicetype"
							 U0 LEFT OUTER JOIN
							 "lava_scheduler_app_groupdevicetypepermission"
							 U1 ON (U0."name" =
								U1.
								"devicetype_id")
							 LEFT OUTER JOIN
							 "auth_group" U2
							 ON (U1."group_id" =
							     U2.
							     "id") LEFT OUTER
							 JOIN
							 "auth_user_groups" U3
							 ON (U2."id" =
							     U3.
							     "group_id") LEFT
							 OUTER JOIN
							 "auth_permission" U5
							 ON (U1.
							     "permission_id" =
							     U5.
							     "id") GROUP BY
							 U0.
							 "name"
							 HAVING (SUM
								 (CASE
								  WHEN (U5.
									"codename"
									=
									%s)
								  THEN %
								  s ELSE %
								  s END) =
								 %s OR
								 NOT (SUM
								      (CASE
								       WHEN
								       (U3.
									"user_id"
									=
									%s AND
									U5.
									"codename"
									IN
									(%s,
									 %s,
									 %s))
								       THEN %
								       s ELSE
								       %
								       s END)
								      =
								      %s))))))
	OR (NOT
	    ("lava_scheduler_app_testjob".
	     "id" IN (SELECT U0.
		      "id" FROM "lava_scheduler_app_testjob" U0 LEFT OUTER
		      JOIN "lava_scheduler_app_testjob_viewing_groups" U1
		      ON (U0."id" =
			  U1."testjob_id") WHERE U1.
		      "group_id" IS NULL)) AND NOT (EXISTS (SELECT (1) AS "a"
							    FROM
							    "lava_scheduler_app_testjob_viewing_groups"
							    V1 WHERE (V1.
								      "group_id"
								      IN
								      (SELECT
								       U0.
								       "id"
								       FROM
								       "auth_group"
								       U0
								       WHERE
								       NOT
								       (U0.
									"id"
									IN
									(%s)))
								      AND V1.
								      "testjob_id"
								      =
								      ("lava_scheduler_app_testjob".
								       "id"))
							    LIMIT 1)))) AND
       "lava_scheduler_app_testjob"."submitter_id" =
       %s AND "lava_scheduler_app_testjob"."state" IN (%s, %s))

comment:3 by Simon Charette, 3 years ago

Cc: Simon Charette added
Resolution: needsinfo
Status: newclosed

The issue seems related an attempt at doing a GROUP BY by a subquery that returns more than one row. See the GROUP BY V0."hostname", (SELECT U0."name" FROM "lava_scheduler_app_devicetype" ...) part.

The Q(actual_device__isnull=False, actual_device__in=accessible_devices) lookup in RestrictedTestJobQuerySet.accessible_by_user is causing that somehow, and I assume Device.objects.accessible_by_user is the culprit as it must do a subquery annotation somehow that is not limited to one row.

By giving a cursory look at your code base I couldn't identify the origin of the subquery annotation so I'll close with needsinfo for now but please re-open if you can create a minimal test case as I'll be monitoring this issue.

Version 0, edited 3 years ago by Simon Charette (next)

comment:4 by Antonio Terceiro, 3 years ago

Hi, I spent some time debugging this today. This seems to be caused by get_group_by_cols() returning an incorrect value for a Query object. In a pdb session against the main branch of the git repository, I get this:

$ pdb3 ./manage.py runscript test 
> /home/terceiro/src/linaro/lava/manage.py(22)<module>()
-> import argparse
(Pdb) break /home/terceiro/src/django/django/db/models/lookups.py:434
Breakpoint 1 at /home/terceiro/src/django/django/db/models/lookups.py:434
(Pdb) c
> /home/terceiro/src/django/django/db/models/lookups.py(434)get_group_by_cols()
-> cols.extend(self.rhs.get_group_by_cols())
(Pdb) p self.rhs
<django.db.models.sql.query.Query object at 0x7f6ddb7bd580>
(Pdb) p self.rhs.get_group_by_cols()
[<django.db.models.sql.query.Query object at 0x7f6ddb7bd580>]

In commit 35431298226165986ad07e91f9d3aca721ff38ec, which caused this, django.db.models.sql.query.Query is made a subclass of django.db.models.expressions.BaseExpression. BaseExpression.get_group_by_calls is implemented like this:

    def get_group_by_cols(self, alias=None):                                     
        if not self.contains_aggregate:                                          
            return [self]                                                        
        cols = []                                                                
        for source in self.get_source_expressions():                             
            cols.extend(source.get_group_by_cols())                              
        return cols                                                              

Since BaseExpression also hardcodes contains_aggregate to False, that first if statement always applies. Providing an appropriate implementation of contains_aggregate to Query fixes things for me here:

diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index 2c5f11cbbf..f7c43ff622 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -414,6 +414,11 @@ class Query(BaseExpression):
         annotation.set_source_expressions(new_exprs)
         return annotation, col_cnt
 
+    @cached_property
+    def contains_aggregate(self):
+        annotations = self.annotations.values()
+        return any(getattr(ann, 'contains_aggregate', True) for ann in annotations)
+
     def get_aggregation(self, using, added_aggregate_names):
         """
         Return the dictionary with the values of the existing aggregations.

I wasn't able to write an unit tests for this so far, though.

comment:5 by Antonio Terceiro, 3 years ago

Just noticed that this change causes a few errors for other tests so it's not the final fix.

comment:6 by Antonio Terceiro, 3 years ago

Resolution: needsinfo
Status: closednew

Unfortunately I wasn't able yet to write an unit test that triggers this issue. I was, however, able to write a patch that both fixes my issue and does not cause any existing unit test to fail: https://github.com/django/django/pull/15129

comment:7 by Simon Charette, 3 years ago

Has patch: set
Needs tests: set
Patch needs improvement: set

comment:8 by Simon Charette, 3 years ago

Triage Stage: UnreviewedAccepted

Managed to reproduce with this tests

  • tests/aggregation/tests.py

    diff --git a/tests/aggregation/tests.py b/tests/aggregation/tests.py
    index 9ba70cb35e..f3277d2b5f 100644
    a b def test_aggregation_nested_subquery_outerref(self):  
    13351335        ).values_list('publisher_count', flat=True)
    13361336        self.assertSequenceEqual(books_breakdown, [1] * 6)
    13371337
     1338    def test_filter_in_subquery_or_aggregation(self):
     1339        authors = Author.objects.annotate(
     1340            books_count=Count('book'),
     1341        ).filter(
     1342            Q(pk__in=Book.objects.values('authors')) | Q(books_count__gt=0)
     1343        )
     1344        self.assertQuerysetEqual(authors, Author.objects.all(), ordered=False)
     1345
    13381346    def test_aggregation_random_ordering(self):
    13391347        """Random() is not included in the GROUP BY when used for ordering."""
    13401348        authors = Author.objects.annotate(contact_count=Count('book')).order_by('?')

The gist of it is that the logic that was added to Subquery.get_group_by_cols should have been added to sql.Query.get_group_by_cols and the latter should have proxied it.

The issue is triggered when a lookup containing a subquery is combined using OR to a lookup against an aggregation. When this happens the ORM has to choice but to push the filtering to the HAVING clause and thus group by all inputs passed down to the lookups. Since Subquery resolves to sql.Query and the latter defaults to returning self as inherited from BaseExpression the ORM tries to group by a subquery that potentially returns multiple rows.

Antonio if don't mind if I assign the issue to me I have a patch set almost ready to address the problem and include extra regression tests for new uses cases such as lookup annotations which is a newly supported in 4.0.

in reply to:  8 comment:9 by Antonio Terceiro, 3 years ago

Replying to Simon Charette:

Antonio if don't mind if I assign the issue to me I have a patch set almost ready to address the problem and include extra regression tests for new uses cases such as lookup annotations which is a newly supported in 4.0.

Sure, please go ahead. I just want this to get fixed.

It would be nice if it's possible to backport the fix to 3.2 as well.

comment:10 by Simon Charette, 3 years ago

Owner: changed from nobody to Simon Charette
Status: newassigned

I should be able to submit a patch this week.

It would be nice if it's possible to backport the fix to 3.2 as well.

Unfortunately that won't be possible per our backporting policies, too much risk for regressions and 3.2 is only receiving security backports at this point.

in reply to:  10 comment:11 by Antonio Terceiro, 3 years ago

Replying to Simon Charette:

I should be able to submit a patch this week.

It would be nice if it's possible to backport the fix to 3.2 as well.

Unfortunately that won't be possible per our backporting policies, too much risk for regressions and 3.2 is only receiving security backports at this point.

this means that 3.x will be left with two crash-causing regressions (this one and the one in #32690)

comment:12 by Simon Charette, 3 years ago

Needs tests: unset
Patch needs improvement: unset

comment:13 by Mariusz Felisiak, 3 years ago

Triage Stage: AcceptedReady for checkin

comment:14 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In e5a92d4:

Fixed #33282 -- Fixed a crash when OR'ing subquery and aggregation lookups.

As a QuerySet resolves to Query the outer column references grouping logic
should be defined on the latter and proxied from Subquery for the cases where
get_group_by_cols is called on unresolved expressions.

Thanks Antonio Terceiro for the report and initial patch.

Note: See TracTickets for help on using tickets.
Back to Top