Opened 4 years ago
Last modified 2 years ago
#32297 assigned Bug
QuerySet.get() method not working as expected with Window functions
Reported by: | Jerin Peter George | Owned by: | Saeed Hasani Borzadaran |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Mariusz Felisiak, Simon Charette | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I have a simple model as below,
class Employee(models.Model): name = models.CharField(max_length=40, blank=False, null=False) salary = models.PositiveIntegerField() department = models.CharField(max_length=40, blank=False, null=False) hire_date = models.DateField(blank=False, null=False) age = models.IntegerField(blank=False, null=False) bonus = models.DecimalField(decimal_places=2, max_digits=15, null=True)
and I queried
qs = Employee.objects.annotate( rank=Window(expression=Rank(), order_by=F("salary").desc()), )
and then I have called the get()
method to get the rank of a single specific object
qs.get(pk=pk).rank
Issue
The get(pk=pk).rank
returning a constant value irrespective of the kwargs
applied to the get()
method
Change History (7)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|
follow-up: 3 comment:2 by , 4 years ago
comment:3 by , 4 years ago
If you use qs.get(pk=pk).rank, the generated SQL statement is like this:
SELECT "expressions_window_employee"."id", "expressions_window_employee"."name", "expressions_window_employee"."salary", "expressions_window_employee"."department", "expressions_window_employee"."hire_date", "expressions_window_employee"."age", "expressions_window_employee"."classification_id", "expressions_window_employee"."bonus", RANK() OVER (ORDER BY "expressions_window_employee"."salary" DESC) AS "rank" FROM "expressions_window_employee" WHERE "expressions_window_employee"."id" = 11
If you want to query the value of rank, you can do this:
rank_dict = {item['pk']:item['rank'] for item in qs.values("pk", "rank")} rank_dict.get(12)
Replying to Jerin Peter George:
Test case to reproduce the issue
import datetime from decimal import Decimal from django.db.models import F, Window from django.db.models.functions import Rank from django.test import TestCase from .models import Employee class WindowFunctionTests(TestCase): @classmethod def setUpTestData(cls): Employee.objects.bulk_create([ Employee( name=e[0], salary=e[1], department=e[2], hire_date=e[3], age=e[4], bonus=Decimal(e[1]) / 400, ) for e in [ ('Jones', 45000, 'Accounting', datetime.datetime(2005, 11, 1), 20), ('Williams', 37000, 'Accounting', datetime.datetime(2009, 6, 1), 20), ('Jenson', 45000, 'Accounting', datetime.datetime(2008, 4, 1), 20), ('Adams', 50000, 'Accounting', datetime.datetime(2013, 7, 1), 50), ('Smith', 55000, 'Sales', datetime.datetime(2007, 6, 1), 30), ('Brown', 53000, 'Sales', datetime.datetime(2009, 9, 1), 30), ('Johnson', 40000, 'Marketing', datetime.datetime(2012, 3, 1), 30), ('Smith', 38000, 'Marketing', datetime.datetime(2009, 10, 1), 20), ('Wilkinson', 60000, 'IT', datetime.datetime(2011, 3, 1), 40), ('Moore', 34000, 'IT', datetime.datetime(2013, 8, 1), 40), ('Miller', 100000, 'Management', datetime.datetime(2005, 6, 1), 40), ('Johnson', 80000, 'Management', datetime.datetime(2005, 7, 1), 50), ] ]) def test_rank_with_queryset_get_method(self): qs = Employee.objects.annotate( rank=Window(expression=Rank(), order_by=F("salary").desc()), ) rank_set = list(qs.values_list("pk", "rank")) rank_set_iter = [(emp.pk, emp.rank) for emp in qs] self.assertEqual(rank_set, rank_set_iter) # does queryset iteration has any problem? for pk, rank in rank_set: self.assertEqual(qs.get(pk=pk).rank, rank) # does `.get()` has any problem?
comment:4 by , 4 years ago
I am aware that I can do something similar as you suggested for a "workaround" to get the desired result. But the real question is whether this is the expected behavior for the Window function?
I am sure that I can do a similar query using the Count(...)
expression as
from django.db import models class Musician(models.Model): name = models.CharField(max_length=50) class Album(models.Model): artist = models.ForeignKey(Musician, on_delete=models.CASCADE, related_name="albums") name = models.CharField(max_length=100) qs = Musician.objects.annotate(album_count=models.Count("albums")) print(qs.get(pk=2).album_count)
comment:5 by , 4 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
I'm going to provisionally accept this and ask Simon and Mariusz to have a look, since there does seem to be something amiss.
Taking a similar case from one of my own projects, it's not just values
(which might be/have been the issue) that shows the strange behaviour, but straight iteration:
>>> list((t.pk, t.rank) for t in Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc()))[:10]) [(1444, 1), (1443, 2), (1442, 3), (1441, 4), (1440, 5), (1439, 6), (1438, 7), (1437, 8), (1436, 9), (1435, 10)] >>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1444).rank 1 >>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1442).rank 1
I'm very much expecting the last line there to output 3. (Although immediately on writing that I'm thinking, but there is only one result in the query, so it would have rank 1… — in which case it's a just a usage issue, and can be closed.)
comment:6 by , 4 years ago
It's just how window functions behave when mixed with a query level constraint (WHERE
) or LIMIT
'ing;
In your first example the [:10]
performs a LIMIT 10
so the RANK
will be over ten results while get(pk=1442)
will be against in a single result through WHERE id = 1442 LIMIT 21
.
To me this is a similar class of problem to #24462, #28333. If it was possible to use the current result set as a subquery then the following would work as expected
Task.objects.annotate( rank=Window(expression=Rank(), order_by=F('logged').desc()) )[0:10].subquery().get(pk=1442)
As it would result in
SELECT * FROM ( SELECT task.*, RANK() OVER (ORDER BY logged DESC) AS "rank" FROM task LIMIT 10 ) subquery WHERE id=1442 LIMIT 21
Where the window function would span over the full ten set of rows.
Whether or not get
should do the subquery pushdown automatically is debatable. I get a feeling it should as the fact it uses LIMIT
is more of an implementation detail but it would be slightly backward incompatible.
comment:7 by , 2 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Test case to reproduce the issue