#33772 closed Cleanup/optimization (fixed)
Queryset first() returns different result than queryset[0] when using ExpressionWrapper
| Reported by: | Pablo Pissi | Owned by: | Pablo Pissi |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 3.2 |
| Severity: | Normal | Keywords: | ExpressionWrapper, first, queryset |
| Cc: | Triage Stage: | Ready for checkin | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Hi! I was creating a new app and found the following scenario, seems like an issue to me:
I have three models:
class Transaction(models.Model):
amount = models.DecimalField(max_digits=10, decimal_places=2)
description = models.CharField(max_length=255)
created_by = models.ForeignKey(get_user_model(), on_delete=models.CASCADE, related_name="created_transactions")
created_at = models.DateTimeField(auto_now_add=True)
class TransactionUser(models.Model):
user = models.ForeignKey(get_user_model(), on_delete=models.CASCADE, related_name="transactions")
transaction = models.ForeignKey("bills.Transaction", on_delete=models.CASCADE, related_name="users")
amount = models.DecimalField(max_digits=10, decimal_places=2)
transaction_type = models.ForeignKey('bills.TransactionType', on_delete=models.PROTECT)
class TransactionType(models.Model):
name = models.CharField(max_length=50)
code = models.CharField(max_length=3)
multiplier = models.DecimalField(max_digits=5, decimal_places=2)
I create the following instances:
transaction = Transaction(amount=100)
positive = TransactionType(multiplier=1)
negative = TransactionType(multiplier=-1)
TransactionUser(transaction=transaction, user=user1, amount=50, transaction_type=negative)
TransactionUser(transaction=transaction, user=user2, amount=50, transaction_type=negative)
TransactionUser(transaction=transaction, user=user1, amount=10, transaction_type=positive)
I want to obtain the balance of each transaction by doing the following query:
TransactionUser.objects \
.filter(user_id=user_id) \
.annotate(
real_amount=ExpressionWrapper(F("amount")*F("transaction_type__multiplier"), output_field=DecimalField()),
).values("transaction") \
.annotate(
balance=Sum("real_amount")
)
If I log the queryset returned, the value logged is:
<QuerySet [{'transaction': 1, 'balance': Decimal('-40.0000')}]>
But if I log
queryset.first()["balance"] >>> -50
And if I log
queryset[0]["balance"] >>> -40
It's also reproducible in 4.0
Change History (15)
comment:1 by , 3 years ago
| Resolution: | → invalid |
|---|---|
| Status: | new → closed |
comment:2 by , 3 years ago
| Resolution: | invalid |
|---|---|
| Status: | closed → new |
Hi,
I've read the docs about ordering, but in this case the queryset has only one value and the value returned by .first() is not a value that exists on the queryset, is that correct?
It's returning values from the non-aggregated queryset
comment:3 by , 3 years ago
| Resolution: | → needsinfo |
|---|---|
| Status: | new → closed |
It's also reproducible …
Hi Pablo — it's looks like a usage problem, so the support channels are the place to go, but if you can create a sample project that demonstrates the issue, perhaps in a failing test case, I'm happy to look further.
comment:4 by , 3 years ago
Hi again!
Here is a repo with a test that shows the issue: https://github.com/pablop94/django-poc
Let me know if you need more info.
comment:5 by , 3 years ago
The addition of order_by columns to the GROUP BY clause specified by values().annotate() is documented.
The crux of the issue remain; you are asking for an element at a particular index of a collection with an undefined ordering. Adding order_by("example_model") to your test case addresses the irregularity you are experiencing with first().
first() could arguably be adjusted to raise an exception when attempted to be used on an unordered collection to refuse the temptation to guess but that would require a deprecation period. Another approach could be to only raise an exception when first() is called on a unordered queryset with an explicit values() aggregation grouping that doesn't include pk which is the case reported here.
Here's what the code could look like for a systematic error on attempts to use first() on a undefined ordering queryset performing aggregation over specific columns
-
django/db/models/query.py
diff --git a/django/db/models/query.py b/django/db/models/query.py index 67fcb411eb..da3d1ff3a7 100644
a b async def alatest(self, *fields): 1032 1032 1033 1033 def first(self): 1034 1034 """Return the first object of a query or None if no match is found.""" 1035 for obj in (self if self.ordered else self.order_by("pk"))[:1]: 1035 queryset = self 1036 if not self.ordered: 1037 if isinstance(self.query.group_by, tuple): 1038 raise TypeError( 1039 "Cannot use first() on an unordered queryset performing aggregation" 1040 ) 1041 queryset = self.order_by("pk") 1042 for obj in queryset[:1]: 1036 1043 return obj 1037 1044 1038 1045 async def afirst(self):
And a somewhat more complex implementation that still allow the pk
-
django/db/models/query.py
diff --git a/django/db/models/query.py b/django/db/models/query.py index 67fcb411eb..0534a2cc16 100644
a b async def alatest(self, *fields): 1032 1032 1033 1033 def first(self): 1034 1034 """Return the first object of a query or None if no match is found.""" 1035 for obj in (self if self.ordered else self.order_by("pk"))[:1]: 1035 queryset = self 1036 if not self.ordered: 1037 if isinstance(self.query.group_by, tuple) and not any( 1038 col.output_field is self.model._meta.pk for col in self.query.group_by 1039 ): 1040 raise TypeError( 1041 "Cannot use first() on an unordered queryset performing aggregation " 1042 "over a set of expressions that doesn't include the base model's primary key" 1043 ) 1044 queryset = self.order_by("pk") 1045 for obj in queryset[:1]: 1036 1046 return obj 1037 1047 1038 1048 async def afirst(self):
The latter patch seems to be passing the test suite.
Thoughts on the value of such adjustment Carlton?
comment:6 by , 3 years ago
| Resolution: | needsinfo |
|---|---|
| Status: | closed → new |
| Triage Stage: | Unreviewed → Accepted |
| Type: | Uncategorized → Cleanup/optimization |
Thanks both. Good.
Thoughts on the value of such adjustment Carlton?
We do see tickets periodically running into this problem: I recall a number of occasions seeing links to that same docs section.
I guess for every one that we see there are several/many in the wild we don't
The first() docs link to the Interaction with order_by() section there, presumably to try to forestall this confusion, but I still get the impression it's one of the more subtle points of the ORM behaviour, so if we can provide a better error message to point folks in the right direction that's likely a win. 🤔
I'll provisionally accept on that basis. (Mariusz can object on review if he doesn't like it 😃)
Pablo would you like to work on a PR based on Simon's suggestion?
comment:7 by , 3 years ago
Yes, I could work on a PR this weekend. Maybe I can add a documentation note to the first function too.
This change will require a deprecation period right?
comment:8 by , 3 years ago
This change will require a deprecation period right?
I don't think so. We're (just) adding a warning for an already incorrect usage.
comment:9 by , 3 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
comment:10 by , 3 years ago
| Has patch: | set |
|---|
Hi!
I've created this PR. I've also updated the last() function: https://github.com/django/django/pull/15769
comment:11 by , 3 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
This is happening because you don't explicitly define an ordering for your queryset so the database can return whatever rows it wants in the case of
[0]while.first()explicitly order bypkif no ordering is specified.Please TicketClosingReasons/UseSupportChannels before filing issues in this ticket tracker.