Opened 16 months ago

Closed 8 months ago

Last modified 8 months ago

#27241 closed Bug (fixed)

Annotate doesn't work with PostgreSQL views anymore

Reported by: Jaap Roes Owned by:
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: postgresql view aggregate annotate
Cc: Shai Berger, Simon Charette, Josh Smeaton, olivier.tabone@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using annotate on (unmanaged) models backed by views has stopped working and now throws a:

psycopg2.ProgrammingError: column "..." must appear in the GROUP BY clause or be used in an aggregate function...

This is caused by the fix to ticket #19259.

Queries with an aggregation now only list the PK field in the GROUP BY clause, but this doesn't work for views because they cannot have primary keys in PostgreSQL.

Now Django doesn't officially support views, or models without a primary key, but it did work before so it's somewhat of a regression.

I ran into this problem because I'm working on a legacy database and have models that are backed by a view using the db_table Meta option.

Change History (17)

comment:1 Changed 16 months ago by Jaap Roes

I've been able to work around this by defining my own DatabaseWrapper and DatabaseFeatures (which inherits from the ones in django.db.backends.postgresql) and setting allows_group_by_selected_pks to False. However, this disables this feature for all models not just the ones backed by a view, so it's not ideal.

comment:2 Changed 16 months ago by Tim Graham

Do you have a solution in mind?

comment:3 Changed 16 months ago by Jaap Roes

Unfortunately I don't have a solution that would just make things work.

One of the underlying issues here is that Django has never officially supported database views in the first place so I don't know how much effort should be put into making this work again. The fact that this used to work in the past was pure coincidence anyway.

Come to think of it, this problem might not just limited to database views. I believe any (unmanged) model that hasn't set a PRIMARY KEY constraint at the database level could trigger this issue.

comment:4 Changed 16 months ago by Shai Berger

Cc: Shai Berger added

Possible workaround (no time to test ATM): Before the annotate() call, add only() naming all the fields. Or maybe even drop the PK, if that makes sense.

comment:5 Changed 16 months ago by Jaap Roes

Using only seems to have no effect on the GROUP BY clause.

I am not sure what you mean by "drop the PK", Django doesn't allow models without a field marked as primary_key and will add one if you don't specify one.

comment:6 Changed 16 months ago by Simon Charette

Cc: Simon Charette added
Triage Stage: UnreviewedAccepted
Version: 1.10master

Before the annotate() call, add only() naming all the fields. Or maybe even drop the PK, if that makes sense.

The trick only() won't work as pk is always implicitly selected (only('field') == only('field', 'pk')).

The only solution I can think of is disabling the optimization for unmanaged models but the real issue here is really that unmanaged models are not meant to be used to deal with database views. The fact that Django enforces the presence of a primary key and views are not allowed to have one really highlights the conceptual clash.

As Django doesn't provide any way to interact with views and even suggests managed can be used for this purpose I think we should simply disable the optimization for this specific case.

comment:7 Changed 16 months ago by Jaap Roes

I don't known if disabling the optimisation for all unmanaged models is the right choice. It seems that this is a fairly esoteric edge case. The fact that this behaviour was changed in 1.9 and almost nobody has noticed until now speaks for itself (I can only find one other issue that looks similar: #26758).

I'm sure that most unmanaged models are normal database tables with a primary key constraint. It would be a shame to have to disable something that works fine for the vast majority of use cases just because of this one edge case.

Instead, would it be feasible to be able to opt-out of the optimisation by setting a flag on the model's Meta class? So instead of checking if managed is False, Django could check if allows_group_by_selected_pks is set to False for the model and then choose not to perform the related optimisation.

I am actively maintaining this application and would have no qualms about adding a flag to the few models affected by this issue. If anyone else runs into this the must also be actively maintaining theirs so they can do the same thing.

comment:8 Changed 16 months ago by Josh Smeaton

Can you try adding a field as a primary key on the view? Ideally each row would have a unique key in a view (I know this isn't always the case), but if you just tell Django that your unmanaged model has a primary key, it'll try to group on that.

For what it's worth I've always been a big user of unmanaged models over views - so I'm keen to see this resolved and not abandoned.

comment:9 Changed 16 months ago by Josh Smeaton

Cc: Josh Smeaton added

comment:10 Changed 16 months ago by Jaap Roes

As far as Django is concerned these views do have a primary key (there's a id field, it even has a unique index). The problem is that at the database level this field doesn't, and cannot, have a primary key constraint. Without a real primary key Postgres cannot group by primary key only, since they don't exist.

This is what happens if you try to add a PK to a (materialized) view:

=> ALTER MATERIALIZED VIEW my_view ADD CONSTRAINT my_view_pkey PRIMARY KEY (id);
ERROR:  "my_view" is not a table
Last edited 16 months ago by Jaap Roes (previous) (diff)

comment:11 Changed 16 months ago by Josh Smeaton

Yes, you're right. Sorry for the noise! I mistakenly thought it was Django throwing an error.

comment:12 Changed 16 months ago by Simon Charette

I don't think a meta option for this special case is worth it but a flag to prevent Django from implicitly adding an auto-incrementing primary key when none is explicitly specified could be useful in multiple cases (migrations comes to mind here) and leveraged in this one.

comment:13 Changed 15 months ago by Olivier Tabone

Cc: olivier.tabone@… added
Needs tests: set
Owner: changed from nobody to Olivier Tabone
Status: newassigned

comment:14 Changed 15 months ago by Olivier Tabone

Owner: Olivier Tabone deleted
Status: assignednew

comment:15 Changed 8 months ago by Simon Charette

Fixed in daf2bd3efe53cbfc1c9fd00222b8315708023792. #28107 was a duplicate of this ticket.

comment:16 Changed 8 months ago by Simon Charette

Resolution: fixed
Status: newclosed

comment:17 Changed 8 months ago by Simon Charette

I just submitted a post on the mailing list to gather feedback. It includes an alternative solution to make turning the optimization on for unmanaged models backed by tables easier.

Note: See TracTickets for help on using tickets.
Back to Top