Opened 17 months ago

Last modified 17 months ago

#32297 new Bug

QuerySet.get() method not working as expected with Window functions

Reported by: Jerin Peter George Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Mariusz Felisiak, Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Jerin Peter George)

I have a simple model as below,

class Employee(models.Model):
    name = models.CharField(max_length=40, blank=False, null=False)
    salary = models.PositiveIntegerField()
    department = models.CharField(max_length=40, blank=False, null=False)
    hire_date = models.DateField(blank=False, null=False)
    age = models.IntegerField(blank=False, null=False)
    bonus = models.DecimalField(decimal_places=2, max_digits=15, null=True)

and I queried

qs = Employee.objects.annotate(
    rank=Window(expression=Rank(), order_by=F("salary").desc()),
)

and then I have called the get() method to get the rank of a single specific object

qs.get(pk=pk).rank

Issue

The get(pk=pk).rank returning a constant value irrespective of the kwargs applied to the get() method

Change History (6)

comment:1 Changed 17 months ago by Jerin Peter George

Description: modified (diff)

comment:2 Changed 17 months ago by Jerin Peter George

Test case to reproduce the issue

import datetime
from decimal import Decimal

from django.db.models import F, Window
from django.db.models.functions import Rank
from django.test import TestCase

from .models import Employee


class WindowFunctionTests(TestCase):
    @classmethod
    def setUpTestData(cls):
        Employee.objects.bulk_create([
            Employee(
                name=e[0],
                salary=e[1],
                department=e[2],
                hire_date=e[3],
                age=e[4],
                bonus=Decimal(e[1]) / 400,
            )
            for e in [
                ('Jones', 45000, 'Accounting', datetime.datetime(2005, 11, 1), 20),
                ('Williams', 37000, 'Accounting', datetime.datetime(2009, 6, 1), 20),
                ('Jenson', 45000, 'Accounting', datetime.datetime(2008, 4, 1), 20),
                ('Adams', 50000, 'Accounting', datetime.datetime(2013, 7, 1), 50),
                ('Smith', 55000, 'Sales', datetime.datetime(2007, 6, 1), 30),
                ('Brown', 53000, 'Sales', datetime.datetime(2009, 9, 1), 30),
                ('Johnson', 40000, 'Marketing', datetime.datetime(2012, 3, 1), 30),
                ('Smith', 38000, 'Marketing', datetime.datetime(2009, 10, 1), 20),
                ('Wilkinson', 60000, 'IT', datetime.datetime(2011, 3, 1), 40),
                ('Moore', 34000, 'IT', datetime.datetime(2013, 8, 1), 40),
                ('Miller', 100000, 'Management', datetime.datetime(2005, 6, 1), 40),
                ('Johnson', 80000, 'Management', datetime.datetime(2005, 7, 1), 50),
            ]
        ])

    def test_rank_with_queryset_get_method(self):
        qs = Employee.objects.annotate(
            rank=Window(expression=Rank(), order_by=F("salary").desc()),
        )
        rank_set = list(qs.values_list("pk", "rank"))
        rank_set_iter = [(emp.pk, emp.rank) for emp in qs]

        self.assertEqual(rank_set, rank_set_iter)  # does queryset iteration has any problem?
        for pk, rank in rank_set:
            self.assertEqual(qs.get(pk=pk).rank, rank)  # does `.get()` has any problem?

comment:3 in reply to:  2 Changed 17 months ago by starryrbs

If you use qs.get(pk=pk).rank, the generated SQL statement is like this:

SELECT "expressions_window_employee"."id", "expressions_window_employee"."name", "expressions_window_employee"."salary", "expressions_window_employee"."department", "expressions_window_employee"."hire_date", "expressions_window_employee"."age", "expressions_window_employee"."classification_id", "expressions_window_employee"."bonus", RANK() OVER (ORDER BY "expressions_window_employee"."salary" DESC) AS "rank" FROM "expressions_window_employee" WHERE "expressions_window_employee"."id" = 11

If you want to query the value of rank, you can do this:

rank_dict = {item['pk']:item['rank'] for item in qs.values("pk", "rank")}
rank_dict.get(12)

Replying to Jerin Peter George:

Test case to reproduce the issue

import datetime
from decimal import Decimal

from django.db.models import F, Window
from django.db.models.functions import Rank
from django.test import TestCase

from .models import Employee


class WindowFunctionTests(TestCase):
    @classmethod
    def setUpTestData(cls):
        Employee.objects.bulk_create([
            Employee(
                name=e[0],
                salary=e[1],
                department=e[2],
                hire_date=e[3],
                age=e[4],
                bonus=Decimal(e[1]) / 400,
            )
            for e in [
                ('Jones', 45000, 'Accounting', datetime.datetime(2005, 11, 1), 20),
                ('Williams', 37000, 'Accounting', datetime.datetime(2009, 6, 1), 20),
                ('Jenson', 45000, 'Accounting', datetime.datetime(2008, 4, 1), 20),
                ('Adams', 50000, 'Accounting', datetime.datetime(2013, 7, 1), 50),
                ('Smith', 55000, 'Sales', datetime.datetime(2007, 6, 1), 30),
                ('Brown', 53000, 'Sales', datetime.datetime(2009, 9, 1), 30),
                ('Johnson', 40000, 'Marketing', datetime.datetime(2012, 3, 1), 30),
                ('Smith', 38000, 'Marketing', datetime.datetime(2009, 10, 1), 20),
                ('Wilkinson', 60000, 'IT', datetime.datetime(2011, 3, 1), 40),
                ('Moore', 34000, 'IT', datetime.datetime(2013, 8, 1), 40),
                ('Miller', 100000, 'Management', datetime.datetime(2005, 6, 1), 40),
                ('Johnson', 80000, 'Management', datetime.datetime(2005, 7, 1), 50),
            ]
        ])

    def test_rank_with_queryset_get_method(self):
        qs = Employee.objects.annotate(
            rank=Window(expression=Rank(), order_by=F("salary").desc()),
        )
        rank_set = list(qs.values_list("pk", "rank"))
        rank_set_iter = [(emp.pk, emp.rank) for emp in qs]

        self.assertEqual(rank_set, rank_set_iter)  # does queryset iteration has any problem?
        for pk, rank in rank_set:
            self.assertEqual(qs.get(pk=pk).rank, rank)  # does `.get()` has any problem?

comment:4 Changed 17 months ago by Jerin Peter George

I am aware that I can do something similar as you suggested for a "workaround" to get the desired result. But the real question is whether this is the expected behavior for the Window function?

I am sure that I can do a similar query using the Count(...) expression as

from django.db import models


class Musician(models.Model):
    name = models.CharField(max_length=50)


class Album(models.Model):
    artist = models.ForeignKey(Musician, on_delete=models.CASCADE, related_name="albums")
    name = models.CharField(max_length=100)


qs = Musician.objects.annotate(album_count=models.Count("albums"))
print(qs.get(pk=2).album_count)

comment:5 Changed 17 months ago by Carlton Gibson

Cc: Mariusz Felisiak Simon Charette added
Triage Stage: UnreviewedAccepted

I'm going to provisionally accept this and ask Simon and Mariusz to have a look, since there does seem to be something amiss.

Taking a similar case from one of my own projects, it's not just values (which might be/have been the issue) that shows the strange behaviour, but straight iteration:

>>> list((t.pk, t.rank) for t in Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc()))[:10])
[(1444, 1), (1443, 2), (1442, 3), (1441, 4), (1440, 5), (1439, 6), (1438, 7), (1437, 8), (1436, 9), (1435, 10)]
>>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1444).rank
1
>>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1442).rank
1

I'm very much expecting the last line there to output 3. (Although immediately on writing that I'm thinking, but there is only one result in the query, so it would have rank 1… — in which case it's a just a usage issue, and can be closed.)

Last edited 17 months ago by Carlton Gibson (previous) (diff)

comment:6 Changed 17 months ago by Simon Charette

It's just how window functions behave when mixed with a query level constraint (WHERE) or LIMIT'ing;

In your first example the [:10] performs a LIMIT 10 so the RANK will be over ten results while get(pk=1442) will be against in a single result through WHERE id = 1442 LIMIT 21.

To me this is a similar class of problem to #24462, #28333. If it was possible to use the current result set as a subquery then the following would work as expected

Task.objects.annotate(
    rank=Window(expression=Rank(), order_by=F('logged').desc())
)[0:10].subquery().get(pk=1442)

As it would result in

SELECT * FROM (
    SELECT task.*, RANK() OVER (ORDER BY logged DESC) AS "rank"
    FROM task
    LIMIT 10
) subquery WHERE id=1442 LIMIT 21

Where the window function would span over the full ten set of rows.

Whether or not get should do the subquery pushdown automatically is debatable. I get a feeling it should as the fact it uses LIMIT is more of an implementation detail but it would be slightly backward incompatible.

Note: See TracTickets for help on using tickets.
Back to Top