Opened 8 years ago
Closed 5 years ago
#28107 closed Bug (fixed)
Can't perform annotation on related table when un-managed model is backed by a DB view
Reported by: | powderflask | Owned by: | Vojtěch Boček |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | QuerySet.extra |
Cc: | Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I'm working with a legacy DB (ArcSDE database -- really ugly) -- have been able to accommodate its oddities without resorting much to raw SQL, but django upgrade (1.8 --> 1.11) caused a previously working annotation to fail:
psycopg2.ProgrammingError: :column ... must appear in the GROUP BY clause or be used in an aggregate function
DB: PostgreSQL 9.3.16 (i.e., this is not same issue as #26758 ) Python3.6, Django1.11
The annotation simply counts the number of related records from a related 'attachments' table:
qs.annotate(num_attachments=Count('attachments'))
The root cause appears to be that the relation between an unmanaged model (backed by a DB View) and attachments tables uses a unique field other than the main model's primary key (I know -- told you it was ugly - ArcSDE does not really support relations, except they implement attachments with this odd ball sigh ).
The change in behaviour seems to manifest from #19259 (I believe django1.8 added all fields to the groupby clause).
Since django now includes only the primary key in the groupby clause, postgresql won't do this aggregation across a relation that uses a non-pk field.
I suspect there is a general issue here that aggregations on a related-table won't work in postgresql unless the relation is on the primary key field (so, yeah, basically this issue applies to almost no one, right...).
UPDATE: The root cause is actually that Postgresql treats Views differently than Tables w.r.t. what is required in the group by clause.
Seems likely there is a better solution to this, but after a half-day of search / effort, I resorted to the following:
qs.extra(select={'num_attachments': 'SELECT COUNT("attachmentid") FROM {attach_table} WHERE {attach_table}.rel_globalid = {model_table}.globalid'.format( model_table = qs.model._meta.db_table, attach_table = qs.model.AttachmentModel._meta.db_table, )},)
This works and achieves my goal -- to annotate model with the number of related attachments.
Since the model.attachments.count()
query works just fine, I'm considering eliminating this annotation and replacing with a property on the model class, what's one more query... I'm sure there must be an smoother way, but it eludes me...
Since the docs suggested to open a ticket for queries that could not be resolved without resorting to extra(), here it is, for whatever its worth. Hope this hasn't been a complete waste of time for you.
Attachments (1)
Change History (31)
comment:1 by , 8 years ago
Cc: | added |
---|
comment:2 by , 8 years ago
The DB and relations are just nasty to set it all up. I'm pretty sure I know what's needed to reproduce this -- I'll try to get something simpler to break in the same way, no sense in you banging your head against the ArcSDE wall.
BTW - edge case is a very polite term for what this really is... After writing this up, I decided to sacrifice the extra query to get the attachment.count() to buy a world of ponies. My code base loves me for it
comment:3 by , 8 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
I tried. I cannot reproduce this with a simpler test case. My original suspicion is completely wrong - it works just fine.
I can only reproduce this in the insanely complex DB -- I will continue to try to track down the cause, but for now I'd say this issue is too undefined to work on. Sorry for the trouble, I'm closing this for now.
comment:4 by , 8 years ago
This may have to do with one of the original models drawing data from a View - found this:
" The feature of Postgres to be able to use the primary key of a table with GROUP BY and not need to add the other columns of that table in the GROUP BY clause is relatively new and works only for base tables. The optimizer is not (yet?) clever enough to identify primary keys for views, ctes or derived tables."
https://dba.stackexchange.com/questions/88988/postgres-error-column-must-appear-in-the-group-by-clause-or-be-used-in-an-aggre
I will try to reproduce this.
comment:5 by , 8 years ago
powderflask, if this happens to be the source of the issue and you are using un-managed models (_meta.managed = False
) maybe we could branch on that to prevent the optimization.
by , 8 years ago
Attachment: | issue28107.zip added |
---|
small django app with models, migrations, and tests that illustrate the issue
comment:6 by , 8 years ago
Resolution: | worksforme |
---|---|
Status: | closed → new |
OK - that's got it. Yep -- this issues occurs when at least one of the models in the aggregation query is un-managed and backed by a DB view rather than a table.
Attached is a zip of a simple django app that runs in a default django container, demonstrates the issue:
1) Demonstrate these weird relations work with models backed by DB tables:
-
INSTALLED_APPS = [ 'issue28107.apps.Issue28107Config', ... ]
- configure / create postgre DB,
- manage.py migrate
- run the unit-test -- it should pass
2) Replace table with view (created by migrations)
- in models.py, remove 2 comments from Treatment.Meta:
managed = False db_table = 'issue28107_treatment_vw'
- re-run unit-test -- it should fail:
column "issue28107_treatment_vw.globalid" must appear in the GROUP BY clause or be used in an aggregate function
This is a backwards-compatibility issue -- this worked at least up to django1.8
I have no idea where to begin with this in terms of suggesting a patch -- any pointers?
thank you for the quick reply and suggestions -- awesome.
comment:7 by , 8 years ago
It occurs to me that if there were some way to force values into the group-by clause, that would serve as a reasonable workaround, given this is bound to be a fairly rare use-case.
I did read some hacks that did this in 1.8, but it looks like query.group_by is now a boolean rather than a list of fields... was pretty ugly anyhow.
But I'm not missing something more obvious here am I -- like an extra() clause or something that could force the offending aggregate fields into the group_by clause?
comment:8 by , 8 years ago
Has patch: | set |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
Version: | 1.11 → master |
Thanks to your detailed report writing a patch was easy.
comment:9 by , 8 years ago
Should I change the title of this issue to reflect the root cause was a model backed by a View?
comment:11 by , 8 years ago
Description: | modified (diff) |
---|---|
Summary: | Can't perform annotation on related table when relation between tables not on primary key → Can't perform annotation on related table when un-managed model is backed by a DB view |
comment:12 by , 8 years ago
From what I understood you are still on 1.8 but this patch will only be available in a couple of months when 2.0 is released.
To get this fixed on your side in the mean time you'll have to define your own django.db.backends.postgresql.base.DatabaseWrapper
subclass but it's not as intimidating as it sounds.
Simply define a DatabaseWrapper
subclass in a chosen_module_name.base
module that extends the aforementioned class and set it's features_class
attribute to a subclass of django.db.backends.postgresql.features.DatabaseFeatures
with allows_group_by_selected_pks = False
django.db.backends.postgresql import base, features class DatabaseWrapper(base.DatabaseWrapper): class features_class(features.DatabaseFeatures): allows_group_by_selected_pks = False
Once this is done you just have to point the BACKEND
key of your DATABASES
setting to chosen_module
. I suggest you have a look at how django-transaction-hooks did it if you need a more concrete example.
Note that this disable the optimization all together. If you want to keep the optimization for managed models you'll have to define your own SQLCompiler
subclass and override the collapse_group_by
method which is a bit harder. I suggest you have a look how it's done for the MySQL backend to figure it out.
comment:13 by , 8 years ago
Thank you so much Simon - really above the call.
I am migrating to django1.11, which caused me to stumble into this -- I can certainly implement a workaround until we move to 2.0
thanks again for all your help -- much appreciated.
comment:14 by , 8 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:17 by , 8 years ago
Ahh it's the exact same issue Claude thanks for noticing. Somehow I thought both reports were the same ticket.
comment:18 by , 8 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
This patch introduces a significant behavior change for all uses of unmanaged models (not only those that are backed by a database view) since Django 1.9. I believe the proper solution would be to just add the five lines of code that are posted in #comment:12 to the docs, somewhere near where it says that unmanaged models should be used for database views.
For what it's worth, I reported #27241 and implemented the same workaround, so this patch is unnecessary in my case. As Simon said in the commit message, Django doesn't support database views. Being forced to include a small workaround to do the unsupported seems like a small price to pay.
comment:19 by , 8 years ago
I was able to work-around this by using a Subquery to create the annotation -- willing to draft some documentation to this effect if needed.
Because this creates a backwards-compatibility issue, wondering if there should also be a note in the release notes somewhere if you go this route?
comment:20 by , 8 years ago
Jaap Roes,
At this point I'll take the issue to developers mailing list to gather more feedback and give this issue more exposure.
It's true that it could be causing a performance regression for unmanaged models in Django 2.0 relying on aggregation but since we've been documenting such models should be used to interface with views for so long I think there's still a point to be made about the fact the optimization broke the public API.
Let's not forget that pre-Django 1.9 users were using ORM aggregation on PostgreSQL just fine in most cases before the optimization landed and that it's only an issue when a model contains large columns.
comment:21 by , 8 years ago
Has patch: | unset |
---|---|
Triage Stage: | Ready for checkin → Accepted |
comment:22 by , 8 years ago
I just submitted a post on the mailing list to gather feedback. It includes an alternative solution to make turning the optimization on for unmanaged models backed by tables easier. I'd appreciate if you could chime in Jaap Roes.
comment:23 by , 5 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Hello, this change pretty much prevents me from upgrading to 2.x since my app displays data from a DB it doesn't own (so it is unmanaged) and the GROUP BY optimizations are required to get decent performance.
I have implemented changes proposed by Simon in this pull request: https://github.com/django/django/pull/11606. It attempts to auto-detect whether the unmanaged table is a view and enables the optimization when it is a regular table.
comment:24 by , 5 years ago
Has patch: | set |
---|
comment:25 by , 5 years ago
Patch needs improvement: | set |
---|
comment:26 by , 5 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:27 by , 5 years ago
Needs documentation: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
comment:28 by , 5 years ago
Needs documentation: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
It's possible that's an edge case with the handling of
to_field
.Would it be possible for your to provide all the models involved in the queryset you're generating as it's really hard to figure out if the issue actually lies in the _GROUP BY_ logic without a set of models to reproduce the issue.