Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#27632 closed Bug (fixed)

Oracle backend fails to execute a query with an aggregation that contains an expression in the GROUP BY.

Reported by: Josh Smeaton Owned by: Mariusz Felisiak
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: oracle
Cc: josh.smeaton@…, felisiak.mariusz@…, me@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Oracle fails on queries where there is an expression containing replace parameters in the SELECT list and GROUP BY list, because the database appears to check the SELECT and GROUP BY match before doing parameter substitution.

Parameters in cx_Oracle follow the form :arg0 and :arg1 or named as :price and :discount. Since the arguments in the SELECT and GROUP BY have a different argument number, the database rejects the query with the following error:

*** django.db.utils.DatabaseError: ORA-00979: not a GROUP BY expression

Failing test for aggregation_regress:

Book.objects.annotate(
    discount_price=F('price') * 0.75
).values(
    'discount_price'
).annotate(sum_discount=Sum('price'))

SQL and parameters:

'SELECT 
("AGGREGATION_REGRESS_BOOK"."PRICE" * :arg0) AS "DISCOUNT_PRICE", 
SUM("AGGREGATION_REGRESS_BOOK"."PRICE") AS "SUM_DISCOUNT" 
FROM "AGGREGATION_REGRESS_BOOK" 
GROUP BY ("AGGREGATION_REGRESS_BOOK"."PRICE" * :arg1), 
"AGGREGATION_REGRESS_BOOK"."NAME" 
ORDER BY "AGGREGATION_REGRESS_BOOK"."NAME" ASC' 

args: (0.75, 0.75)

Django can't really do a whole lot here without changing parameter substitution to use named arguments, or by somehow keeping track of what parameter positions are bound to which expression, and reusing the argument names when the Oracle backend replaces the "%s" placeholders with the ":argN" format that cx_Oracle requires. In either case, it'd involve a very large refactoring, and even then I'm not sure how feasible it would be.

I *think* this is a problem that should be solved by Oracle. If anyone is able to find any references to this bug in Oracle documentation or systems I'd love to see it. As I don't work for a company using Oracle anymore, I'm not able to utilise support to investigate further.

Attachments (1)

test_annotation_with_value.diff (961 bytes ) - added by Mariusz Felisiak 8 years ago.

Download all attachments as: .zip

Change History (14)

by Mariusz Felisiak, 8 years ago

comment:1 by Mariusz Felisiak, 8 years ago

Cc: felisiak.mariusz@… added
Triage Stage: UnreviewedAccepted

I attached test that currently fails.

comment:2 by Mariusz Felisiak, 8 years ago

Owner: changed from nobody to Mariusz Felisiak
Status: newassigned

IMO we can avoid this error ("ORA-00979: not a GROUP BY expression") by omitting adding Value expression (bind parameters) in CombinedExpression to group by clause. Query without it i.e.

SELECT
  ("AGGREGATION_REGRESS_BOOK"."PRICE" * :arg0) AS "DISCOUNT_PRICE",
  SUM(("AGGREGATION_REGRESS_BOOK"."PRICE" * :arg1)) AS "SUM_DISCOUNT"
FROM "AGGREGATION_REGRESS_BOOK"
GROUP BY
  "AGGREGATION_REGRESS_BOOK"."PRICE",
  "AGGREGATION_REGRESS_BOOK"."NAME"
ORDER BY "AGGREGATION_REGRESS_BOOK"."NAME" ASC

will work this same as:

SELECT
  ("AGGREGATION_REGRESS_BOOK"."PRICE" * :arg0) AS "DISCOUNT_PRICE",
  SUM(("AGGREGATION_REGRESS_BOOK"."PRICE" * :arg1)) AS "SUM_DISCOUNT"
FROM "AGGREGATION_REGRESS_BOOK"
GROUP BY
  "AGGREGATION_REGRESS_BOOK"."PRICE" * :arg2,
  "AGGREGATION_REGRESS_BOOK"."NAME"
ORDER BY "AGGREGATION_REGRESS_BOOK"."NAME" ASC
}}}.

comment:3 by Josh Smeaton, 8 years ago

I didn't actually know that rule, but it totally makes sense. You've grouped by that value, so you can use it how you will. If that's the case, then shouldn't we just need to extract the col references from any expressions that we need to group by? Ultimately, the expression doesn't matter, just the columns an expression may reference. Indeed, by not including the base expression in the set returned by get_group_by_cols, the test passes.

diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index 20268ee660..46fb86ea21 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -323,8 +323,6 @@ class BaseExpression(object):
         return False, ()

     def get_group_by_cols(self):
-        if not self.contains_aggregate:
-            return [self]
         cols = []
         for source in self.get_source_expressions():
             cols.extend(source.get_group_by_cols())

I haven't tested this thoroughly, but the aggregation, aggregation_regress, and expressions test modules complete successfully.

comment:4 by Josh Smeaton, 8 years ago

Thinking about this more, this isn't always going to be correct. What if you want to GROUP BY the substring of a column? We'd be changing results.

Name, Siblings
Josh, 3
James, 3
SELECT SUM(Siblings), SUBSTR(NAME, 1) FROM TABLE
GROUP BY SUBSTR(NAME, '1') 

J, 6
SELECT SUM(Siblings), SUBSTR(NAME, 1) FROM TABLE
GROUP BY NAME

J, 3
J, 3

in reply to:  4 comment:5 by Shai Berger, 8 years ago

Replying to Josh Smeaton:

Thinking about this more, this isn't always going to be correct. What if you want to GROUP BY the substring of a column? We'd be changing results.

Yes. I have made a similar note on the PR that begat this issue, regarding GROUP BY a sum of columns.

The Django ORM supports user-named arguments on most backends, but never uses named arguments itself. The :arg0, :arg1 etc. are just the Oracle implementation of a query that is, at a higher level, phrased with %s, %s. While going down the levels, we lose the information that two formal arguments in the SQL actually refer to the same argument in the user code. This is the real problem we need to solve, and I seriously doubt it can be achieved in the 1.11 time-frame.

comment:6 by Adam Johnson, 8 years ago

Cc: me@… added

comment:7 by Mariusz Felisiak, 8 years ago

Has patch: set

comment:8 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In 6dbe56ed:

Fixed #27632 -- Unified query parameters by their values on Oracle.

comment:9 by Tim Graham, 8 years ago

Has patch: unset
Resolution: fixed
Status: closednew

This broke a test on Python 2 (sorry that's not on the PR builder):

======================================================================
ERROR: setUpClass (expressions_case.tests.CaseExpressionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tim/code/django/django/test/testcases.py", line 1040, in setUpClass
    cls.setUpTestData()
  File "/home/tim/code/django/tests/expressions_case/tests.py", line 27, in setUpTestData
    o = CaseTestModel.objects.create(integer=1, integer2=1, string='1')
  File "/home/tim/code/django/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/tim/code/django/django/db/models/query.py", line 392, in create
    obj.save(force_insert=True, using=self.db)
  File "/home/tim/code/django/django/db/models/base.py", line 804, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/tim/code/django/django/db/models/base.py", line 834, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/home/tim/code/django/django/db/models/base.py", line 920, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/home/tim/code/django/django/db/models/base.py", line 959, in _do_insert
    using=using, raw=raw)
  File "/home/tim/code/django/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/tim/code/django/django/db/models/query.py", line 1032, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/home/tim/code/django/django/db/models/sql/compiler.py", line 1045, in execute_sql
    cursor.execute(sql, params)
  File "/home/tim/code/django/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/tim/code/django/django/db/utils.py", line 94, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/tim/code/django/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/tim/code/django/django/db/backends/oracle/base.py", line 485, in execute
    return self.cursor.execute(query, self._param_generator(params))
DatabaseError: ORA-12704: character set mismatch

The issue looks like the bytestring for BinaryField and text string for TextField are treated as equal in Python.

comment:10 by Tim Graham <timograham@…>, 8 years ago

In 4579c3f:

Refs #27632 -- Unified query parameters by their types and values on Oracle.

Fixed Python 2 regression in 6dbe56ed7855f34585884a2381fb1cec22ddc824.

Thanks Simon Charette for the implementation idea.

comment:11 by Tim Graham, 8 years ago

Resolution: fixed
Status: newclosed

comment:12 by GitHub <noreply@…>, 8 years ago

In 26c9f529:

Refs #27632 -- Simplified params dict creation for Oracle (#7772)

comment:13 by Tim Graham <timograham@…>, 8 years ago

In bf1c9570:

Refs #23919 -- Removed Python 2 workaround for hashing Oracle params (refs #27632).

Note: See TracTickets for help on using tickets.
Back to Top