Opened 3 years ago

Closed 3 years ago

Last modified 10 months ago

#32433 closed Bug (fixed)

Delete distinct produces an unexpected and potentially harmful SQL

Reported by: egism Owned by: egism
Component: Database layer (models, ORM) Version: 3.1
Severity: Normal Keywords: orm, delete, distinct
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by egism)

I was looking for a way to delete the first Comment of each Post (a sample domain). Since I know that every new Post starts with a system generated comment I decided to go with:

Comment.objects.order_by('post_id', 'created_at').distinct('post_id').delete()

Before proceeding I tested it with:

Comment.objects.order_by('post_id', 'created_at').distinct('post_id').count()

Made sure the result actually contains what I needed and proceeded with the delete(). The result was rather surprising. I was notified that the whole table was wiped clean. I then checked the actual SQL that was executed and it was a simple DELETE FROM comments;.

As an ORM user, I would say it is the worst outcome possible and I would at least expect an error in such a case or ideally a SQL of what I was trying to achieve. At the same time, count and delete produces an inconsistent result which is even more mistaking.

Potential solutions:

  • raise an error with a decent explanation
  • produce a desired SQL according to the query

Since I have never submitted a change to Django, I have a very limited knowledge of the ORM and its intricacies. I could give it a try and issue a patch for this with some guidance.

Change History (10)

comment:1 by egism, 3 years ago

Description: modified (diff)

comment:2 by Simon Charette, 3 years ago

Keywords: postgresql removed
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

I think we should start by failing loudly when distinct().delete() is used as adding support for it would require a subquery pushdown (DELETE FROM table WHERE field IN (SELECT DISTINCT field FROM table) which would require a non negligible amount of work and additional complexity for the rare cases where this is useful and a manual __in subquery lookup can be used to achieve the same results e.g.

Comment.objects.filter(
    post_id__in=Comment.objects.order_by('post_id', 'created_at').distinct('post_id').values('post_id')
).delete()

egism if you want to work on a patch you should raise a TypeError in QuerySet.delete if self.query.distinct or self.query.distinct_fields. A regression test can then be added to delete_regress/tests (source).

in reply to:  2 comment:3 by egism, 3 years ago

Hi Simon,

Sounds very simple, I will give it a go. Will be a good opportunity to make my first contribution. Thanks for directions.

comment:4 by egism, 3 years ago

Owner: changed from nobody to egism
Status: newassigned

comment:6 by Mariusz Felisiak, 3 years ago

Has patch: set

comment:7 by Mariusz Felisiak, 3 years ago

Triage Stage: AcceptedReady for checkin

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In 6307c3f1:

Fixed #32433 -- Added error message on QuerySet.delete() following distinct().

comment:9 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In aa1aed92:

[3.2.x] Fixed #32433 -- Added error message on QuerySet.delete() following distinct().

Backport of 6307c3f1a123f5975c73b231e8ac4f115fd72c0d from master

comment:10 by Mariusz Felisiak <felisiak.mariusz@…>, 10 months ago

In 28e20771:

Refs #32433 -- Reallowed calling QuerySet.delete() after distinct().

While values(*field_excluding_pk).distinct() and
distinct(*field_excluding_pk) can reduce the number of resulting rows
in a way that makes subsequent delete() calls ambiguous standalone
.distinct() calls cannot.

Since delete() already disallows chain usages with values() the only
case that needs to be handled, as originally reported, is when
DISTINCT ON is used via distinct(*fields).

Refs #32682 which had to resort to subqueries to prevent duplicates in
the admin and caused significant performance regressions on MySQL
(refs #34639).

This partly reverts 6307c3f1a123f5975c73b231e8ac4f115fd72c0d.

Note: See TracTickets for help on using tickets.
Back to Top