Opened 4 years ago
Last modified 3 years ago
#32297 assigned Bug
QuerySet.get() method not working as expected with Window functions
Description (last modified by ) ¶
I have a simple model as below,
class Employee(models.Model): name = models.CharField(max_length=40, blank=False, null=False) salary = models.PositiveIntegerField() department = models.CharField(max_length=40, blank=False, null=False) hire_date = models.DateField(blank=False, null=False) age = models.IntegerField(blank=False, null=False) bonus = models.DecimalField(decimal_places=2, max_digits=15, null=True)
and I queried
qs = Employee.objects.annotate( rank=Window(expression=Rank(), order_by=F("salary").desc()), )
and then I have called the get()
method to get the rank of a single specific object
qs.get(pk=pk).rank
Issue
The get(pk=pk).rank
returning a constant value irrespective of the kwargs
applied to the get()
method
According to the ticket's flags, the next step(s) to move this issue forward are:
- To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is:
[https://github.com/django/django/pull/#### PR]
.
Change History (7)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|
follow-up: 3 comment:2 by , 4 years ago
comment:3 by , 4 years ago
If you use qs.get(pk=pk).rank, the generated SQL statement is like this:
SELECT "expressions_window_employee"."id", "expressions_window_employee"."name", "expressions_window_employee"."salary", "expressions_window_employee"."department", "expressions_window_employee"."hire_date", "expressions_window_employee"."age", "expressions_window_employee"."classification_id", "expressions_window_employee"."bonus", RANK() OVER (ORDER BY "expressions_window_employee"."salary" DESC) AS "rank" FROM "expressions_window_employee" WHERE "expressions_window_employee"."id" = 11
If you want to query the value of rank, you can do this:
rank_dict = {item['pk']:item['rank'] for item in qs.values("pk", "rank")} rank_dict.get(12)
Replying to Jerin Peter George:
Test case to reproduce the issue
import datetime from decimal import Decimal from django.db.models import F, Window from django.db.models.functions import Rank from django.test import TestCase from .models import Employee class WindowFunctionTests(TestCase): @classmethod def setUpTestData(cls): Employee.objects.bulk_create([ Employee( name=e[0], salary=e[1], department=e[2], hire_date=e[3], age=e[4], bonus=Decimal(e[1]) / 400, ) for e in [ ('Jones', 45000, 'Accounting', datetime.datetime(2005, 11, 1), 20), ('Williams', 37000, 'Accounting', datetime.datetime(2009, 6, 1), 20), ('Jenson', 45000, 'Accounting', datetime.datetime(2008, 4, 1), 20), ('Adams', 50000, 'Accounting', datetime.datetime(2013, 7, 1), 50), ('Smith', 55000, 'Sales', datetime.datetime(2007, 6, 1), 30), ('Brown', 53000, 'Sales', datetime.datetime(2009, 9, 1), 30), ('Johnson', 40000, 'Marketing', datetime.datetime(2012, 3, 1), 30), ('Smith', 38000, 'Marketing', datetime.datetime(2009, 10, 1), 20), ('Wilkinson', 60000, 'IT', datetime.datetime(2011, 3, 1), 40), ('Moore', 34000, 'IT', datetime.datetime(2013, 8, 1), 40), ('Miller', 100000, 'Management', datetime.datetime(2005, 6, 1), 40), ('Johnson', 80000, 'Management', datetime.datetime(2005, 7, 1), 50), ] ]) def test_rank_with_queryset_get_method(self): qs = Employee.objects.annotate( rank=Window(expression=Rank(), order_by=F("salary").desc()), ) rank_set = list(qs.values_list("pk", "rank")) rank_set_iter = [(emp.pk, emp.rank) for emp in qs] self.assertEqual(rank_set, rank_set_iter) # does queryset iteration has any problem? for pk, rank in rank_set: self.assertEqual(qs.get(pk=pk).rank, rank) # does `.get()` has any problem?
comment:4 by , 4 years ago
I am aware that I can do something similar as you suggested for a "workaround" to get the desired result. But the real question is whether this is the expected behavior for the Window function?
I am sure that I can do a similar query using the Count(...)
expression as
from django.db import models class Musician(models.Model): name = models.CharField(max_length=50) class Album(models.Model): artist = models.ForeignKey(Musician, on_delete=models.CASCADE, related_name="albums") name = models.CharField(max_length=100) qs = Musician.objects.annotate(album_count=models.Count("albums")) print(qs.get(pk=2).album_count)
comment:5 by , 4 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
I'm going to provisionally accept this and ask Simon and Mariusz to have a look, since there does seem to be something amiss.
Taking a similar case from one of my own projects, it's not just values
(which might be/have been the issue) that shows the strange behaviour, but straight iteration:
>>> list((t.pk, t.rank) for t in Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc()))[:10]) [(1444, 1), (1443, 2), (1442, 3), (1441, 4), (1440, 5), (1439, 6), (1438, 7), (1437, 8), (1436, 9), (1435, 10)] >>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1444).rank 1 >>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1442).rank 1
I'm very much expecting the last line there to output 3. (Although immediately on writing that I'm thinking, but there is only one result in the query, so it would have rank 1… — in which case it's a just a usage issue, and can be closed.)
comment:6 by , 4 years ago
It's just how window functions behave when mixed with a query level constraint (WHERE
) or LIMIT
'ing;
In your first example the [:10]
performs a LIMIT 10
so the RANK
will be over ten results while get(pk=1442)
will be against in a single result through WHERE id = 1442 LIMIT 21
.
To me this is a similar class of problem to #24462, #28333. If it was possible to use the current result set as a subquery then the following would work as expected
Task.objects.annotate( rank=Window(expression=Rank(), order_by=F('logged').desc()) )[0:10].subquery().get(pk=1442)
As it would result in
SELECT * FROM ( SELECT task.*, RANK() OVER (ORDER BY logged DESC) AS "rank" FROM task LIMIT 10 ) subquery WHERE id=1442 LIMIT 21
Where the window function would span over the full ten set of rows.
Whether or not get
should do the subquery pushdown automatically is debatable. I get a feeling it should as the fact it uses LIMIT
is more of an implementation detail but it would be slightly backward incompatible.
comment:7 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Test case to reproduce the issue