Opened 3 years ago

Last modified 22 months ago

#24096 new Bug

GROUP BY on oracle fails with TextField and BinaryField

Reported by: Josh Smeaton Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: josh.smeaton@…, Shai Berger, mail@…, robinchew@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When a GROUP BY is generated that includes a column of type LOB (BLOB/NCLOB) such as TextField or BinaryField, then an error is thrown on Oracle. See the test case below:

class Author(models.Model):
    name = models.CharField(max_length=100)
    bio = models.TextField(default='')
    age = models.IntegerField()
    friends = models.ManyToManyField('self', blank=True)


def test_annotate_with_textfield_in_values(self):
        qs = Author.objects.annotate(min=Min('pk')).order_by('pk')
        self.assertQuerysetEqual(
            qs, [ 1, 1, 1, 1, 1, 1, 1 ],
            transform=lambda a: a.min
        )

django.db.utils.DatabaseError: ORA-00932: inconsistent datatypes: expected - got NCLOB

Adding a defer('bio') does not work either, as it's still included in the GROUP BY (but not the SELECT). The only way I could get this to work is by providing a values(), and listing all fields except LOB type columns.

This isn't a regression either - the same testcase fails on the 1.6 branch.

Change History (10)

comment:1 Changed 3 years ago by Josh Smeaton

Cc: josh.smeaton@… added

comment:2 Changed 3 years ago by Simon Charette

Triage Stage: UnreviewedAccepted

Looks related to #20971 and #19259.

Do you know if Oracle also supports grouping by primary key only? This is the suggested fix for #19259.

comment:3 Changed 3 years ago by Josh Smeaton

Yes, this is the general case of #20971, so it is almost a duplicate. IMO the linked ticket is a bit confusing. Not sure if we should close this as a duplicate and add this information to the other or not.

In any case, Oracle does not support grouping by primary key. All fields listed in the select list must also be present in the group by list.

There are a few options I can think of:

  1. Document the limitation (must use values()) on Oracle
  2. Remove LOB type columns from the select and group by list
  3. Convert (with lossy behaviour) the LOBs to reasonable types in both lists
  4. Remove deferred columns from GROUP BY when the PK is available
  5. A combo of 2 and 4 - automatically defer LOB type columns and remove them from GROUP BY

Option 5 provides the most reasonable behaviour here I think, as it shouldn't change results, and won't be lossy with regards to data. It should also allow the same django code to run on all backends, at the expense of an extra query on Oracle to fetch LOB results on demand.

comment:4 Changed 3 years ago by Shai Berger

Cc: Shai Berger added

I've just closed #20971 as a duplicate of this one, because the issue is presented more clearly here.

Quoting my comment from that ticket:

This seems to be a correctness problem on Oracle, but may cause performance issues elsewhere (other backends will group over deferred fields, in particular text-fields, which may indeed bring back the problems of #17144).

Of jarshwah's options, I think option 4 (removing deferred columns from the group-by) is a no-brainer (even if the PK is not available). I find option 3 unacceptable (except on MySql, perhaps... no, I'm kidding), and option 1 vastly inferior.

With respect to removal, I don't like the idea of completely automatic removal. Too much magic for my tastes. But -- having actually implemented such automatic removal for a project I was involved with (in a custom query-set class) -- I do see the need for it to be easy; and per my comment above, I also think we should tell non-Oracle users about it.

So my suggestion is:

  1. Add a database-feature can_compare_LOBs
  2. When a query groups over a LOB -- if can_compare_LOBs, warn about performance implications; else, error out (with a clear Django exception rather than an obscure Oracle error)
  3. Add a query-set method to defer all LOBs, and/or a query-set subclass that does it automatically.

comment:5 Changed 3 years ago by Josh Smeaton

The problem with 4 is that natural queries that run on most backends will fail on Oracle:

    Model.objects.annotate(Count('field')) # error on Oracle with TextField, fine on others..
    Model.objects.defer('text_field').annotate(Count('field')) # required for Oracle, and suggested for others

While option 5 is somewhat magical, it's also the best way to have cross-database querysets work without modification. Oracle users will always need to defer LOB-like columns, and will never need to un-defer (since it's an error). Raising a warning on other backends should be enough if they haven't deferred. We could control this behaviour with a setting, but that seems a little wasteful.

Another thing to remember is that any models added with select_related will also require their LOB like columns to be deferred.

I'm not a fan of adding a new queryset method to defer a certain class of column, and I'm not too partial about adding a queryset subclass either - since it'd require special knowledge to recognise that it'd be needed, and will be a source of bugs in libraries that define models with TextFields, but do not test annotations on Oracle.

Another direction could be a meta option: Meta.always_defer = ('field', 'text_field'), with a recommendation to always defer large, rarely used fields. This will have applications larger than this somewhat-narrow usecase, and will be a lot less magical.

comment:6 Changed 3 years ago by Simon Charette

Another direction could be a meta option: Meta.always_defer = ('field', 'text_field'), with a recommendation to always defer large, rarely used fields. This will have applications larger than this somewhat-narrow usecase, and will be a lot less magical.

I know that Adrian proposed something similar on @developpers.

comment:7 Changed 3 years ago by Anssi Kääriäinen

I guess there is also option 6: rework the query to something like:

SELECT innerq.*, basetable.lob
  FROM (SELECT <original_query_without_lob_fields>) innerq
  INNER JOIN basetable on basetable.id = innerq.id

That is, run the original select query without LOB fields, then join the LOB fields back. Needless to say this solution is likely complex to implement. I think the solution would likely look something like this:

  • Generate the select and group by of the query. If the select list has any LOB fields, then remove them from the select and group by and memoize what was removed.
  • Generate the inner query
  • Join back the memoized columns in an outer query
  • You can't actually use SQL using innerq.* as mentioned above, the SELECT list must be in the same order as it was for the original query so that select_related works properly.

As this solution is likely too complex, I think I like the Meta.always_defer approach instead. There should be some way to override the defer though (.defer(None) would clear always_defers, too?)

comment:8 Changed 3 years ago by Shai Berger

The meta option is indeed attractive. I would add a twist: Allow a field to declare itself deferred (this would somewhat parallel unique/Meta.unique_together), that is, a new field option deferred=False.

Build into that some API which will let the LOB field find out that they're on Oracle (or, more generally, a backend with can_compare_lobs=False), and you also get the automatic-compatibility.

Last edited 3 years ago by Shai Berger (previous) (diff)

comment:9 Changed 2 years ago by Jason Robinson

Cc: mail@… added

comment:10 Changed 22 months ago by Robin

Cc: robinchew@… added
Note: See TracTickets for help on using tickets.
Back to Top