Opened 2 years ago
Closed 2 years ago
#33772 closed Cleanup/optimization (fixed)
Queryset first() returns different result than queryset[0] when using ExpressionWrapper
Reported by: | Pablo Pissi | Owned by: | Pablo Pissi |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | ExpressionWrapper, first, queryset |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi! I was creating a new app and found the following scenario, seems like an issue to me:
I have three models:
class Transaction(models.Model): amount = models.DecimalField(max_digits=10, decimal_places=2) description = models.CharField(max_length=255) created_by = models.ForeignKey(get_user_model(), on_delete=models.CASCADE, related_name="created_transactions") created_at = models.DateTimeField(auto_now_add=True) class TransactionUser(models.Model): user = models.ForeignKey(get_user_model(), on_delete=models.CASCADE, related_name="transactions") transaction = models.ForeignKey("bills.Transaction", on_delete=models.CASCADE, related_name="users") amount = models.DecimalField(max_digits=10, decimal_places=2) transaction_type = models.ForeignKey('bills.TransactionType', on_delete=models.PROTECT) class TransactionType(models.Model): name = models.CharField(max_length=50) code = models.CharField(max_length=3) multiplier = models.DecimalField(max_digits=5, decimal_places=2)
I create the following instances:
transaction = Transaction(amount=100) positive = TransactionType(multiplier=1) negative = TransactionType(multiplier=-1) TransactionUser(transaction=transaction, user=user1, amount=50, transaction_type=negative) TransactionUser(transaction=transaction, user=user2, amount=50, transaction_type=negative) TransactionUser(transaction=transaction, user=user1, amount=10, transaction_type=positive)
I want to obtain the balance of each transaction by doing the following query:
TransactionUser.objects \ .filter(user_id=user_id) \ .annotate( real_amount=ExpressionWrapper(F("amount")*F("transaction_type__multiplier"), output_field=DecimalField()), ).values("transaction") \ .annotate( balance=Sum("real_amount") )
If I log the queryset returned, the value logged is:
<QuerySet [{'transaction': 1, 'balance': Decimal('-40.0000')}]>
But if I log
queryset.first()["balance"] >>> -50
And if I log
queryset[0]["balance"] >>> -40
It's also reproducible in 4.0
Change History (12)
comment:1 by , 2 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 2 years ago
Resolution: | invalid |
---|---|
Status: | closed → new |
Hi,
I've read the docs about ordering, but in this case the queryset has only one value and the value returned by .first() is not a value that exists on the queryset, is that correct?
It's returning values from the non-aggregated queryset
comment:3 by , 2 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
It's also reproducible …
Hi Pablo — it's looks like a usage problem, so the support channels are the place to go, but if you can create a sample project that demonstrates the issue, perhaps in a failing test case, I'm happy to look further.
comment:4 by , 2 years ago
Hi again!
Here is a repo with a test that shows the issue: https://github.com/pablop94/django-poc
Let me know if you need more info.
comment:5 by , 2 years ago
The addition of order_by
columns to the GROUP BY
clause specified by values().annotate()
is documented.
The crux of the issue remain; you are asking for an element at a particular index of a collection with an undefined ordering. Adding order_by("example_model")
to your test case addresses the irregularity you are experiencing with first()
.
first()
could arguably be adjusted to raise an exception when attempted to be used on an unordered collection to refuse the temptation to guess but that would require a deprecation period. Another approach could be to only raise an exception when first()
is called on a unordered queryset with an explicit values()
aggregation grouping that doesn't include pk
which is the case reported here.
Here's what the code could look like for a systematic error on attempts to use first()
on a undefined ordering queryset performing aggregation over specific columns
-
django/db/models/query.py
diff --git a/django/db/models/query.py b/django/db/models/query.py index 67fcb411eb..da3d1ff3a7 100644
a b async def alatest(self, *fields): 1032 1032 1033 1033 def first(self): 1034 1034 """Return the first object of a query or None if no match is found.""" 1035 for obj in (self if self.ordered else self.order_by("pk"))[:1]: 1035 queryset = self 1036 if not self.ordered: 1037 if isinstance(self.query.group_by, tuple): 1038 raise TypeError( 1039 "Cannot use first() on an unordered queryset performing aggregation" 1040 ) 1041 queryset = self.order_by("pk") 1042 for obj in queryset[:1]: 1036 1043 return obj 1037 1044 1038 1045 async def afirst(self):
And a somewhat more complex implementation that still allow the pk
-
django/db/models/query.py
diff --git a/django/db/models/query.py b/django/db/models/query.py index 67fcb411eb..0534a2cc16 100644
a b async def alatest(self, *fields): 1032 1032 1033 1033 def first(self): 1034 1034 """Return the first object of a query or None if no match is found.""" 1035 for obj in (self if self.ordered else self.order_by("pk"))[:1]: 1035 queryset = self 1036 if not self.ordered: 1037 if isinstance(self.query.group_by, tuple) and not any( 1038 col.output_field is self.model._meta.pk for col in self.query.group_by 1039 ): 1040 raise TypeError( 1041 "Cannot use first() on an unordered queryset performing aggregation " 1042 "over a set of expressions that doesn't include the base model's primary key" 1043 ) 1044 queryset = self.order_by("pk") 1045 for obj in queryset[:1]: 1036 1046 return obj 1037 1047 1038 1048 async def afirst(self):
The latter patch seems to be passing the test suite.
Thoughts on the value of such adjustment Carlton?
comment:6 by , 2 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
Thanks both. Good.
Thoughts on the value of such adjustment Carlton?
We do see tickets periodically running into this problem: I recall a number of occasions seeing links to that same docs section.
I guess for every one that we see there are several/many in the wild we don't
The first()
docs link to the Interaction with order_by() section there, presumably to try to forestall this confusion, but I still get the impression it's one of the more subtle points of the ORM behaviour, so if we can provide a better error message to point folks in the right direction that's likely a win. 🤔
I'll provisionally accept on that basis. (Mariusz can object on review if he doesn't like it 😃)
Pablo would you like to work on a PR based on Simon's suggestion?
comment:7 by , 2 years ago
Yes, I could work on a PR this weekend. Maybe I can add a documentation note to the first function too.
This change will require a deprecation period right?
comment:8 by , 2 years ago
This change will require a deprecation period right?
I don't think so. We're (just) adding a warning for an already incorrect usage.
comment:9 by , 2 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:10 by , 2 years ago
Has patch: | set |
---|
Hi!
I've created this PR. I've also updated the last() function: https://github.com/django/django/pull/15769
comment:11 by , 2 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
This is happening because you don't explicitly define an ordering for your queryset so the database can return whatever rows it wants in the case of
[0]
while.first()
explicitly order bypk
if no ordering is specified.Please TicketClosingReasons/UseSupportChannels before filing issues in this ticket tracker.