Opened 7 years ago

Last modified 5 years ago

#28107 closed Bug

Can't perform annotation on related table when un-managed model is backed by a DB view — at Version 11

Reported by: powderflask Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: QuerySet.extra
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by powderflask)

I'm working with a legacy DB (ArcSDE database -- really ugly) -- have been able to accommodate its oddities without resorting much to raw SQL, but django upgrade (1.8 --> 1.11) caused a previously working annotation to fail:
psycopg2.ProgrammingError: :column ... must appear in the GROUP BY clause or be used in an aggregate function

DB: PostgreSQL 9.3.16 (i.e., this is not same issue as #26758 ) Python3.6, Django1.11

The annotation simply counts the number of related records from a related 'attachments' table:
qs.annotate(num_attachments=Count('attachments'))

The root cause appears to be that the relation between an unmanaged model (backed by a DB View) and attachments tables uses a unique field other than the main model's primary key (I know -- told you it was ugly - ArcSDE does not really support relations, except they implement attachments with this odd ball sigh ).
The change in behaviour seems to manifest from #19259 (I believe django1.8 added all fields to the groupby clause).
Since django now includes only the primary key in the groupby clause, postgresql won't do this aggregation across a relation that uses a non-pk field.

I suspect there is a general issue here that aggregations on a related-table won't work in postgresql unless the relation is on the primary key field (so, yeah, basically this issue applies to almost no one, right...).
UPDATE: The root cause is actually that Postgresql treats Views differently than Tables w.r.t. what is required in the group by clause.

Seems likely there is a better solution to this, but after a half-day of search / effort, I resorted to the following:

qs.extra(select={'num_attachments':
            'SELECT COUNT("attachmentid") FROM {attach_table} WHERE {attach_table}.rel_globalid = {model_table}.globalid'.format(
                    model_table  = qs.model._meta.db_table,
                    attach_table = qs.model.AttachmentModel._meta.db_table,
            )},)

This works and achieves my goal -- to annotate model with the number of related attachments.
Since the model.attachments.count() query works just fine, I'm considering eliminating this annotation and replacing with a property on the model class, what's one more query... I'm sure there must be an smoother way, but it eludes me...

Since the docs suggested to open a ticket for queries that could not be resolved without resorting to extra(), here it is, for whatever its worth. Hope this hasn't been a complete waste of time for you.

Change History (12)

comment:1 by Simon Charette, 7 years ago

Cc: Simon Charette added

It's possible that's an edge case with the handling of to_field.

Would it be possible for your to provide all the models involved in the queryset you're generating as it's really hard to figure out if the issue actually lies in the _GROUP BY_ logic without a set of models to reproduce the issue.

comment:2 by powderflask, 7 years ago

The DB and relations are just nasty to set it all up. I'm pretty sure I know what's needed to reproduce this -- I'll try to get something simpler to break in the same way, no sense in you banging your head against the ArcSDE wall.

BTW - edge case is a very polite term for what this really is... After writing this up, I decided to sacrifice the extra query to get the attachment.count() to buy a world of ponies. My code base loves me for it

comment:3 by powderflask, 7 years ago

Resolution: worksforme
Status: newclosed

I tried. I cannot reproduce this with a simpler test case. My original suspicion is completely wrong - it works just fine.

I can only reproduce this in the insanely complex DB -- I will continue to try to track down the cause, but for now I'd say this issue is too undefined to work on. Sorry for the trouble, I'm closing this for now.

comment:4 by powderflask, 7 years ago

This may have to do with one of the original models drawing data from a View - found this:
" The feature of Postgres to be able to use the primary key of a table with GROUP BY and not need to add the other columns of that table in the GROUP BY clause is relatively new and works only for base tables. The optimizer is not (yet?) clever enough to identify primary keys for views, ctes or derived tables."
https://dba.stackexchange.com/questions/88988/postgres-error-column-must-appear-in-the-group-by-clause-or-be-used-in-an-aggre

I will try to reproduce this.

comment:5 by Simon Charette, 7 years ago

powderflask, if this happens to be the source of the issue and you are using un-managed models (_meta.managed = False) maybe we could branch on that to prevent the optimization.

by powderflask, 7 years ago

Attachment: issue28107.zip added

small django app with models, migrations, and tests that illustrate the issue

comment:6 by powderflask, 7 years ago

Resolution: worksforme
Status: closednew

OK - that's got it. Yep -- this issues occurs when at least one of the models in the aggregation query is un-managed and backed by a DB view rather than a table.

Attached is a zip of a simple django app that runs in a default django container, demonstrates the issue:

1) Demonstrate these weird relations work with models backed by DB tables:

  • INSTALLED_APPS = [ 'issue28107.apps.Issue28107Config', ... ]
  • configure / create postgre DB,
  • manage.py migrate
  • run the unit-test -- it should pass

2) Replace table with view (created by migrations)

  • in models.py, remove 2 comments from Treatment.Meta:
            managed = False
            db_table = 'issue28107_treatment_vw'
    
    
  • re-run unit-test -- it should fail:

column "issue28107_treatment_vw.globalid" must appear in the GROUP BY clause or be used in an aggregate function

This is a backwards-compatibility issue -- this worked at least up to django1.8

I have no idea where to begin with this in terms of suggesting a patch -- any pointers?
thank you for the quick reply and suggestions -- awesome.

comment:7 by powderflask, 7 years ago

It occurs to me that if there were some way to force values into the group-by clause, that would serve as a reasonable workaround, given this is bound to be a fairly rare use-case.
I did read some hacks that did this in 1.8, but it looks like query.group_by is now a boolean rather than a list of fields... was pretty ugly anyhow.

But I'm not missing something more obvious here am I -- like an extra() clause or something that could force the offending aggregate fields into the group_by clause?

comment:8 by Simon Charette, 7 years ago

Has patch: set
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug
Version: 1.11master

Thanks to your detailed report writing a patch was easy.

PR

comment:9 by powderflask, 7 years ago

Should I change the title of this issue to reflect the root cause was a model backed by a View?

comment:10 by Simon Charette, 7 years ago

Sure go ahead!

comment:11 by powderflask, 7 years ago

Description: modified (diff)
Summary: Can't perform annotation on related table when relation between tables not on primary keyCan't perform annotation on related table when un-managed model is backed by a DB view
Note: See TracTickets for help on using tickets.
Back to Top