Opened 2 years ago

Last modified 2 years ago

#25298 assigned Cleanup/optimization

Forbid QuerySet filtering by a related field that expects a single value when multiple values are returned

Reported by: Tim Graham Owned by: Y3K
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords:
Cc: github@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

As raised in #25284, different databases error in different cases on queries like Model.objects.filter(related_id=RelatedModel.objects.all())

  • PostgreSQL, always.
  • MySQL, if the subquery returns multiple results.
  • SQLite, never.

It would be useful to see if we could always throw an error in this case.

Change History (8)

comment:1 Changed 2 years ago by Richard Eames

Cc: github@… added

comment:2 Changed 2 years ago by Y3K

Owner: changed from nobody to Y3K
Status: newassigned

I'm taking this one.

Will update once I send the Github pullrequest.

comment:3 Changed 2 years ago by Anssi Kääriäinen

Notably Foo.objects.filter(related_model__exact=RelatedModel.objects.values('pk')) should work when the RelatedModel query returns just a single row. This is a completely valid query, equivalent to:

SELECT ... FROM foo WHERE related_model_id = (SELECT id FROM relatedmodel)

If the subquery returns one or zero rows, this query should work on all backends.

comment:4 Changed 2 years ago by Richard Eames

For reference, here's some sql fiddles:
select * from Child where rel_id = (select id from Parent where cnt=?);

Engine SubQuery Rows Result Link
MySQL 0 No error, 0 records http://sqlfiddle.com/#!9/1d497/1
MySQL 1 No error, 1 record http://sqlfiddle.com/#!9/1d497/2
MySQL 2 Error http://sqlfiddle.com/#!9/1d497/3
PostgreSQL 0 No error, 0 records http://sqlfiddle.com/#!15/1d497/3
PostgreSQL 1 No error, 1 record http://sqlfiddle.com/#!15/1d497/2
PostgreSQL 2 Error http://sqlfiddle.com/#!15/1d497/1
SQLite 0 No error, 0 records http://sqlfiddle.com/#!7/1d497/3
SQLite 1 No error, 1 record http://sqlfiddle.com/#!7/1d497/2
SQLite 2 No error, 2 records http://sqlfiddle.com/#!7/1d497/1

comment:5 Changed 2 years ago by Y3K

Thank you very much for your feedback Naddiseo & akaariai

I'm not very experienced with raw SQL and your comments really help to get on the right track.

comment:6 in reply to:  3 Changed 2 years ago by Y3K

Replying to akaariai:

Notably Foo.objects.filter(related_model__exact=RelatedModel.objects.values('pk')) should work when the RelatedModel query returns just a single row. This is a completely valid query, equivalent to:

SELECT ... FROM foo WHERE related_model_id = (SELECT id FROM relatedmodel)

If the subquery returns one or zero rows, this query should work on all backends.

Hello akaariai.

Don't you think it would be better to forbid such cases as well? If by any means the subquery returns more than one row it'll fail. I think we can save many user errors preventing this before it happens, if we raise an exception whenever a subquery that may return multiple rows is used the users will know and prevent said cases.

Open to comments and suggestions, of course.

comment:7 Changed 2 years ago by Y3K

It looks like it isn't possible to know beforehand if the subquery will return more than one row (unless we execute the query to check first, but we all know this isn't a sane solution), therefore the two possibles solutions I got while discussing this at #django-dev are:

  • Remove said feature at all, validating the queryset passed to the filter, raising an exception if it can return more than one row.
  • Leave the feature as-is and add a warning on the documentation, specifying that SQLite lacks of said validation.

I request more comments / feedback about this please. What should be the desired solution here and why.

Thanks.

Last edited 2 years ago by Y3K (previous) (diff)

comment:8 Changed 2 years ago by Richard Eames

To re-add my opinion on this from the other issue: I think the ORM should be consistent across backends, so it should throw an exception if the inner queryset could return more than a single row. My reasoning is basically to prevent vendor lock-in; Django was (is?) marketed as being mix-and-match for the backends such as database and templating, and Django would take care of the rest. (But, I'm fairly biased on this since I just spent a few unpleasant weeks converting raw mysql in our code to use ORM so that we can eventually switch to postgresql.)

For the use-case that akaariai brought up, perhaps it would have to be changed to accept a limit of one:

Foo.objects.filter(related_model__exact=RelatedModel.objects.values('pk')[:1])

# SELECT * FROM Foo WHERE related_model_id = (SELECT pk FROM RelatedModel LIMIT 1);

This way it's explicit that you only want 0 or 1 returned from the subquery. As a side note, does __exact allow passing in a model instance?

Foo.objects.filter(related_model__exact=RelatedModel.objects.get(pk=1))

Of course, it's not exactly the same since the .get can throw a DoesNotExist exception, and it generates two queries.

An alternate solution is to implicitly add limits, or change = to IN if the subquery may return multiple results. However, I am against this behaviour since - although it may be intuitive - it's implicit and may be the result of a programmer error that should really be raised as an error instead of silently accepted.

Note: See TracTickets for help on using tickets.
Back to Top