Django

Code

Ticket #6422 (new)

Opened 2 years ago

Last modified 15 hours ago

Support for 'DISTINCT ON' queries with QuerySet.distinct()

Reported by: Manfred Wassmann <manolo@NCC-1701.B.Shuttle.DE> Assigned to: nobody
Milestone: Component: Database layer (models, ORM)
Version: SVN Keywords:
Cc: cornbread, ramusus@gmail.com, tim.babych@gmail.com, hr.bjarni+django@gmail.com Triage Stage: Accepted
Has patch: 1 Needs documentation: 1
Needs tests: 1 Patch needs improvement: 1

Description

The patch included modifies django/db/models/query.py so that the distinct method of the QuerySet object optionally takes variable number of field names and, if given, modifies the SQL statements generated so that 'DISTINCT ON (field,...)' is used. The incentive is to allow things like described in the following example:

class Example(models.Model):
  name = models.TextField()
  date = models. DateTimeField()
  other = models.XXX()
  ...

qs = Example.objects.all().distinct('name').order_by('name','-date')

Now qs will return the latest entry for each distinct name. This cannot otherwise be achieved unless resorting to plain SQL.

It should be noted that - at least in standard SQL and Postgres - if name allows NULL values, the query will return all entries for which name is NULL not only the latest one as one might expect.

There will be a couple of '# CHECKME:' comments scattered over the patched file. I tried to mark all places where the change might have a side effect but it wasn't apparent to me if and how to fix it. I.e. there should be no side effect from the code outside the commented regions.

Attachments

query.py.patch (3.9 kB) - added by Manfred Wassmann <manolo@NCC-1701.B.Shuttle.DE> on 01/19/08 08:56:40.
Patch for django/db/models/query.py
query.py.2.patch (4.5 kB) - added by Manfred Wassmann <manolo@NCC-1701.B.Shuttle.DE> on 02/03/08 07:06:01.
New patch fixes count method and resolves quoting issues

Change History

01/19/08 08:56:40 changed by Manfred Wassmann <manolo@NCC-1701.B.Shuttle.DE>

  • attachment query.py.patch added.

Patch for django/db/models/query.py

01/19/08 09:22:42 changed by Manfred Wassmann <manolo@NCC-1701.B.Shuttle.DE>

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

It should further be noted that - at least with Postgres - fields specified with distinct() have to appear first with the order_by() statement.

01/19/08 09:23:44 changed by Manfred Wassmann <manolo@NCC-1701.B.Shuttle.DE>

  • has_patch set to 1.

02/03/08 07:06:01 changed by Manfred Wassmann <manolo@NCC-1701.B.Shuttle.DE>

  • attachment query.py.2.patch added.

New patch fixes count method and resolves quoting issues

02/03/08 07:19:52 changed by Manfred Wassmann <manolo@NCC-1701.B.Shuttle.DE>

I have uploaded a new patch which fixes the handling of multiple fields in a DISTINCT ON query as well as the count method for DISTINCT ON queries and resolves the qouting issues.

For DISTINCT ON queries the distinct method has now to be invoked as distinct(on_fields=<FIELD_LIST>), e.g. distinct(on_fields=('field1','field2')). For convenience a single field name can be passed as a plain string like distinct(on_fields='fieldname').

03/04/08 09:52:09 changed by mtredinnick

  • needs_better_patch set to 1.
  • stage changed from Unreviewed to Accepted.
  • needs_tests set to 1.
  • needs_docs set to 1.

Patches against trunk's query.py aren't particularly useful at the moment, since that code has been more or less entirely rewritten in the queryset-refactor branch. So, if you feel enthusiastic, you could try to update this to work with that branch, otherwise it will need to be rewritten after the branch is merged into trunk.

As to the feature itself, the "on_fields" parameter doesn't really fit with the rest of the API. Call the parameter "fields" for consistency. In fact, given that distinct() takes no parameters currently, it probably makes the most sense just to pass it field names: so it will take *args. That's consistent with values(), valuelist() and select_related().

I notice the patch doesn't contain any tests, which will be necessary, since we need to be able to verify it's correct. Make sure you test any interaction with the .count() method on querysets, since that required some special handling with normal .distinct() calls. Documentation needed as well.

02/24/09 17:16:09 changed by ikelly

  • cc set to ikelly.

07/12/09 15:39:42 changed by cornbread

  • cc changed from ikelly to cornbread.

I just ran into this issue. I am trying to return unique Session keys from a table. I tried PageRequest?.objects.order_by('session').distinct().count() and I get 217 records and not 1 that I should be getting. Would be nice to just be able to do PageRequest?.objects.all().distinct('session').count()

Looking forward to your new patch Manfred Wassmann. Thanks for the great work!

07/20/09 08:08:45 changed by ramusus

  • cc changed from cornbread to cornbread, ramusus@gmail.com.

09/10/09 12:59:25 changed by tymofiy

  • cc changed from cornbread, ramusus@gmail.com to cornbread, ramusus@gmail.com, tim.babych@gmail.com.

01/13/10 06:06:55 changed by hejsan

  • cc changed from cornbread, ramusus@gmail.com, tim.babych@gmail.com to cornbread, ramusus@gmail.com, tim.babych@gmail.com, hr.bjarni+django@gmail.com.

03/18/10 21:38:05 changed by mnbayazit

Any updates on this? Why isn't it being merged with trunk? It's been 2 years!


Add/Change #6422 (Support for 'DISTINCT ON' queries with QuerySet.distinct())




Change Properties
Action